Find jobs for research scientist

Research scientist Interview Questions in India

In a research scientist interview, you will be expected to show that you have the necessary technical knowledge and expertise pertaining to the specific position you are applying for. Some of the common topics include basic statistical methods, machine learning concepts and case study analysis. Also, the interviewer will most probably assess your communication and interpersonal skills, which are essential for effective teamwork and funding acquisition.

10,047 Research Scientist interview questions shared by candidates

Top Research Scientist Interview Questions & How to Answer

Here are three of the top research scientist interview questions and tips on how to answer them:

Question No. 1: What is X concept? What are its assumptions and how do you apply it?

How to answer: Basically, such an interview question asks for a textbook recall of a certain machine learning concept and its conditions and applications. Avoid overcomplicating it. Just give a simple and straightforward answer that shows that you have a solid grasp of the concept.

Question No. 2: Provide an example of a problem you faced in your previous role and how you solved it.

How to answer: The interviewer wants to evaluate your problem-solving skills. Carefully choose a challenging situation that best reflects your ability to solve problems and explain what you did to overcome it. Preferably, the problem should be one that is relevant to your desired position.

Question No. 3: How would you obtain research funding?

How to answer: If you had successfully secured research funding in the past, you can talk about some of the methods you used. If not, highlight the abilities you possess that can help you acquire funding, such as grant writing skills and networking skills.

Top Interview Questions

Sort: Relevance|Popular|Date
Data Scientist was asked...1 March 2016

Write an SQL query that makes recommendations using the pages that your friends liked. Assume you have two tables: a two-column table of users and their friends, and a two-column table of users and the pages they liked. It should not recommend pages you already like.

40 Answers

CREATE temporary table likes ( userid int not null, pageid int not null ) CREATE temporary table friends ( userid int not null, friendid int not null ) insert into likes VALUES (1, 101), (1, 201), (2, 201), (2, 301); insert into friends VALUES (1, 2); select f.userid, l.pageid from friends f join likes l ON l.userid = f.friendid LEFT JOIN likes r ON (r.userid = f.userid AND r.pageid = l.pageid) where r.pageid IS NULL; Less

select w.userid, w.pageid from ( select f.userid, l.pageid from rollups_new.friends f join rollups_new.likes l ON l.userid = f.friendid) w left join rollups_new.likes l on w.userid=l.userid and w.pageid=l.pageid where l.pageid is null Less

Use Except select f.user_id, l.page_id from friends f inner join likes l on f.fd_id = l.user_id group by f.user_id, l.page_id -- for each user, the unique pages that liked by their friends Except select user_id, page_id from likes Less

Show more responses

Data challenge was very similar to the ads analysis challenge on the book the collection of data science takehome challenge, so that was easy (if you have done your homework). SQL was: you have a table where you have date, user_id, song_id and count. It shows at the end of each day how many times in her history a user has listened to a given song. So count is cumulative sum. You have to update this on a daily basis based on a second table that records in real time when a user listens to a given song. Basically, at the end of each day, you go to this second table and pull a count of each user/song combination and then add this count to the first table that has the lifetime count. If it is the first time a user has listened to a given song, you won't have this pair in the lifetime table, so you have to create the pair there and then add the count of the last day. Onsite: lots of ads related and machine learning questions. How to build an ad model, how to test it, describe a model. I didn't do well in some of these.

18 Answers

Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two. Less

Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question? Less

for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum. Less

Show more responses

Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, ..., n rounds?

16 Answers

Because at the beginning time, A has 8 and B has 6, so let A:x and B:y, then A:8+x-y and B:6-x+y; so there are 10/36 prob of B wins. And A wins prob is 21/36 and the equal prob for next round is 5/36. So for B wins at round prob is 10/36. And if they are equal and to have another round, the number has changed to 7 and 7. So A:7+x-y and B:7-x+y, so this time B wins has prob 15/36 and A wins has prob 15/36. And the equal to have another round is 6/36=1/6. So overall B wins in 2 rounds has prob 5/36*15/36. And for round 3,4,...etc, since after each equal round, the number will go back to 7 and 7 so the prob will not change. So B wins in round 3,4,...n has prob 5/36*(6/36)^(r-2)*15/36. r means the number of the total rounds. Less

So many answers...Here's my version: For round1, B win only if it gets 3 or more stones than A, which is (A,B) = (1,4) (1,5) (1, 6) (2, 5) (2,6) (3,6) which is 6 cases out of all 36 probabilities. So B has 1/6 chance to win. To draw, B needs to get exactly 2 stones more than A, which is (A, B) = (1,3) (2,4) (3,5) (4,6) or 1/9. Entering the second round, all stones should be equal, so the chance to draw become 1/6, and the chance for either to win is 5/12. So the final answer is (1/6, 1/9*5/12, (1/9)^2*5/12, .....(1/9)^(n-1)*5/12) ) Less

I don't get it. Shouldn't prob of B winning given it's tie at 1st round be 15/36? given it's tie at 1st round, at the 2nd round Nb > Na can happen if (B,A) is (2,1), (3,1/2),(4,1/2/3), (5,1/2/3/4),(6,1/2/3/4/5), which totals 15 out of 36. Less

Show more responses

Find the second largest element in a Binary Search Tree

15 Answers

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Show more responses

data question: dialoglog (userid int appid int type char , a flag either "imp" or "click" ds timestamp ) How would you access the quality of app? How to compute click-through rate (in mySQL)?

8 Answers

Assuming ctr is defined as total number clicks / total number of impressions (not counting each unique user's action) select appid, total_click_ct / total_imp_ct as ctr from ( select appid, count(distinct case when flag = 'imp' then 1 else 0 end) as total_imp_ct, count(distinct case when flag = 'click' then 1 else 0 end) as total_click_ct, from table where ts > x and ts < y group by appid) table2 order by ctr desc; Less

re: assessing the quality of an app, some ideas: if it's your company's app, you can look at numbers of downloads and server traffic. You can survey your users, measure profitability, and track how often your users refer others to get the app. if it's not your app, you can look at ratings in the app store, cost, read reviews, and look at how well it does against your competitors' cost and ratings. you can look at specific features: does it do what's needed? does it look good? does it have a bunch of unnecessary features that nobody uses or wants? Does it clutter up your storage or run very slowly, i.e. is it optimized? you can look at the company's reputation, if you don't know anything about them, look at their website, does it look professional? does it have demos? can you test the app with a free trial before you buy? Less

SELECT appid, sum(click) / count(*) AS crt FROM dialoglog GROUP BY appid;

Show more responses

Write a sorting algorithm for a numerical dataset in Python.

8 Answers

def mergeSort(lst): split = len(lst) / 2 left = lst[:split] right = lst[split:] if len(left) > 1 or len(right) > 1: left = mergeSort(left) right = mergeSort(right) return merge(left, right) def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_list Less

list = ["1", "4", "2", "10", "5"] list = [ int(i) for i in list ] list.sort() print (list) Less

def mysort(arr): if len(arr) pivot] return mysort(left) + middle + mysort(right) Less

Show more responses

Given a table of friend requests sent and friend requests received, find the user with the most friends.

7 Answers

request_sent (sender, target, date) s request_received (approver, source, date) r > max( to get the most updated view in case of "unfriend" ** we'd still have a blind spot of unfriends who occurred and new requests werent sent. every friend is a result of a request sent and approved and adds a friend to 2 people (sender and approver). SELECT user, sum(cf) as num_friends FROM ( SELECT approver as user, count(distinct sender) as cf FROM s inner join r on s.sender=r.source and GROUP BY approver HAVING>max( UNION ALL SELECT sender as user, count(distinct approver) as cf FROM s inner join r on s.sender=r.source and GROUP BY sender HAVING>max( ) GROUP BY user Less

Why do you need to use both the tables? Cant, we achieve this with only the second table? Less

select userid, sum(friends) from (select approver as userid, count(distinct source) friends from request_received group by approver union select source, count(distinct approver) from request_received group BY source) f group by userid Less

Show more responses

SQL question: given a table of interaction between users (user_a | user_b | day), find number of users who had more than 5 interactions yesterday (assume there is only one unique interaction between a pair of users per day). Product Question: A user satisfaction survey was conducted for two groups of facebook users (each with 50 k sample size). Group1: who had enabled certain login security features Group 2: who had not enabled these security features. It was found that user satisfaction with group1 was 30% lower than with group 2. Why do you think so? Comment on how the survey was conducted?

7 Answers

select c.user_id , sum(c.cnt) as sum_cnt from ( SELECT USER_A as user_id,count(user_b) cnt FROM interactions group by user_a union all SELECT USER_B as user_id,count(user_a) cnt FROM interactions group by user_B) c group by c.user_id having sum(c.cnt)>5 ; Less

For the product question: This is an example of selection bias. The subset of users who opted in for the security features are not representative of the underlying population of FB users. They are probably more concerned about how their data is being used. Further, because the survey would have been completed on a voluntary basis, it represents another source of selection bias. These individuals are more likely to be engaged with the FB product than the general population, and care enough to complete the survey. Once again, these users are not a representative (or for that matter random) sample from the underlying population of FB users. This would explain why Group 1 had a much lower satisfaction rate as compared to Group 2, as they represent a specific subset of FB's users who care deeply about this issue, and by the virtue that they have the security feature on is indicative of the fact that they take data privacy seriously. For the SQL question: This is probably not the most efficient, but is the only thing that came to my mind: with all_users as ( select userA as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userA union all select userB as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userB ), engaged as ( select userID, sum(interactions) as totalInteractions from all_users group by userID having totalInteractions >= 5 ) select count(1) as greaterThanFive from engaged Less

Data : user_1 user_2 day A B 1 A C 1 B D 1 B F 1 C D 1 D F 1 E A 1 F B 1 Query : select usr , sum(cnt) as interactions from ( select user_1 as usr, day , count(*) as cnt from f group by user_1 , day union all select user_2 as usr, day , count(*) as cnt from f group by user_2 , day )) group by usr having sum(cnt) >= 5 Less

Show more responses

Given a table with columns country (with two-letter country abbreviation), count of requests sent, percentage of requests sent failed, condense down to this data grouped by country: country (one row per country), total count of requests sent, total count of requests sent failed.

7 Answers

SELECT country, SUM(count_of_request_sent) AS total_count_of_request_sent, SUM((count_of_request_sent * percent_of_request_failed) / 100) AS total_count_of_request_failed FROM table GROUP BY country; Less

SELECT country, SUM(requests_sent), SUM((percent_failed/100) * requests_sent) FROM requests GROUP BY country; Less

My assumption is that the requests_sent columns is not an integer. It might just have status as per 'failed", "successful" etc. Best to verify from the interviewer. That means we may not be able to take the sum of teh request_sent but have to use count instead. WITH CTE AS ( SELECT country, request_sent, (perc_failed * count(request_sent)) AS total_failed FROM T1 ) SELECT country, COUNT(request_sent), SUM(total_failed) GROUP BY 1 Less

Show more responses
CKM Analytix

Given 25 swimmers and a pool with five lanes, what is the minimum number of heats needed to determine the three fastest swimmers in the group?

6 Answers

Assuming, "fastest time" means the use of a timing device. Sticking to the exact wording of the question, 5 heats are required. Pick the 3 fastest times of the 5 heats. Less

Assuming no timing devices, and additionally, that a swimmer will swim at the same speed in each heat, the answer is 9. Taking the top 3 finishers from the first 5 heats, that leaves 15 swimmers. Because the 3rd-place finisher in any of these heats could be faster than all swimmers in other heats, additional heats must be run. The 2nd round of 3 heats would include the 5 first-place finishers, the 5 second-place finishers, and the 5 third-place finishers. Taking the top 3 from each of those heats, that leaves 9 swimmers. You would eliminate, however, the swimmers who finished 2nd and 3rd in the two heats that included the 2nd and 3rd place finishers. They necessarily would be slower than at least 3 other swimmers in the competition. You would also conclude that the winner of the Round 2 heat that included the 5 1st-place finishers is necessarily the fastest swimmer in the competition. That leaves 4 swimmers on the bubble after a total of 8 heats. The swimmer who won the Round 2 heat of 1st-place finishers, plus the top 2 finishers in the 9th and final heat of 4 swimmers, would be the fastest 3. Less

Case 1: With timer - 5 heats - get the times, choose the top 3. Case 2: No timer - the answer should be 7. Process: Step 1: Divide everyone into 5 groups of 5 each. Run 5 heats - Lets name them Group 1 - 5 Step 2: Take all the first rankers of each heat (run above) and run another heat Now in this heat whoever is 4th and 5th are eliminated (with their groups) We also know that the one who is first in this heat is definitely first in the 25 people. Lets assume for simplicity that group 1 has the first ranker of this heat, group 2 has the second ranker and group 3 has the third ranker. Step 3: Take the 2nd and 3rd player from the group 1 (group with 1st ranker) and 1st and 2nd ranker of group 2 and first ranker of group 3 Basically the 3rd ranker of group 2 is not in the race because there are 3 people who are faster than this player (G1 P1, G2 P1, G2 P2) Similarly 2nd and 3rd of group 3 will also not qualify since there are three people faster than them already (G1 P1, G2 P1, G3 P1) In this heat the top two will be the 2nd and 3rd of the 25 people. Less

Show more responses
Viewing 1 - 10 of 10,047 interview questions