# Senior data scientist Interview Questions

# 4K

Senior Data Scientist interview questions shared by candidates### They check for your attitude, your approach and your anxiety level! I solved both the case studies and I know I rocked the behavioral and Data challenge round, but still didn't ended up not being selected, why?

7 Answers↳

Can you please tell me more about the hacker rank round? what type of questions were there? Also about the sql ? Less

↳

Hi Y0u mentioned that you have cases compiled together. Can you please share it here? It would be really helpful. Thanks in advance Less

↳

I didn't practice the case solving approach enough. Though I had the math right, my approach was messy with too many papers and switching between many papers. Less

### How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 Answers↳

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

↳

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

↳

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

### Case 1: Given APR, Interchange fee, Avg monthly balance, Avg spend every month, and loss rate of 3% calculate the profit per customer. Now justify if it is profitable to give cash back to the customers.. Case 2: 2 ways of campaigning for credit cards 1. Email - 10% of applicants become customers - each representative can verify 10 email applications in an hr and is paid $25/hr 2. Chat - 20% of applicants become customers - each representative can respond to 4 applications in an hr and is paid $25/hr Profit per customer in both the cases in $100. which one is profitable email or chat. Draw the graph of profit vs no of applicants Consider a scenario where there are only 5 representatives to handle applications. In this case which one is more profitable email or chat. calculate the breakeven point for the no of representatives where chat will be profitable than email.

4 Answers↳

Hi Could you please let me know if the guys who are selected had 5th round?

↳

First Question: Email: 10% * 10 = 1 --> get 1 customer with $ 25 per hour Chat: 20% * 4 = 0.8 --> get 0.8 customer with $25 per hour Email is more profitable Second question: Email cost $25 to get one customer Chat cost $25/0.8 = $31.25 to get one customer Both of the profit is $100 Assume the profit is before the pay to the representatives Email graph: line with slope of (100-25)/10 Cost graph: line with slope of (100-31.25)/10 Not sure how to solve the rest of the questions Less

↳

Basic profit and loss calculations. 1 hr of case round and these can be completed in 45 mins. They assess your thought process and your accuracy in doing the calculations. But not sure on what basis they finally select some one for fifth round. You may do well in case round and still you will not be called for the fifth round. Less

### A gas station has 30 gallon of gasoline worth 1.20 per gallon and some worth 1.40 per gallon .how many gallons of the 1.40 brand must the owner mix in to produce gasoline that cost 1.28 per gallon

4 Answers↳

x+y=30 (amt of gas the station has available) x = 30 - y (1.2x + 1.4y)/(30) = 1.28 ((1.2*(30 - y) + 1.4y))/(30) = 1.28 1.2*(30 - y) + 1.4y = 38.4 36 - 1.2y + 1.4y = 38.4 .2y = 2.4 y = 12 Less

↳

(1.2x + 1.4y) / (x + y) = 1.28 1.2x + 1.4y = 1.28x + 1.28y 1.4y - 1.28y = 1.28x - 1.2x .12y = .08x y = .66666667x = (2/3) x if x = 30, y = 30 x (2/3) = 20 Less

↳

20; if 30, the price would be 1.30. so has to be smaller than 30.

### How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 Answers↳

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

↳

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Less

↳

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. prepfully.com/practice-interviews Less

### 1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

4 Answers↳

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Less

↳

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Less

↳

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Less

### The percentage of female customer base

3 Answers↳

Wrote the SQL query to answer this question

↳

Do you have any details on Python questions?

↳

You need demographics data for this. Query would be fairly simple

### Prepare 6-7 situations for behavioral questions and be flexible if they questions is not asked in the way as you are prepared. Try to find the character they asked from your prepared examples. Cases are easy, if you make mistake and feel your answer is awkward, ask for help and modify your calc. I did make a mistake. It was a typo carried to next step's calc. And I found out and asked for help, and found out what's wrong with the formula I wrote.

3 Answers↳

Case1: I'm assuming that the situation is Capital One is launching a credit card with the department store. I'm assuming (I rephrased the questions) given data the questions are to 1) What is the profit for Capital One; 2) What is the break even point?; 3) What will be the impact of progressive opening bonus based on accounts opened?; 4) How does the profit curve look like?; 5) What is the maximum number of accounts and price? My Answer: Hmm... These are interesting questions. Since I don't really know the business structure of credit cards and department stores is I would first ask what the nature of the business is and also to ask definitions to terminology that might be alien to me. For example I would ask what is the opening bonus is. Is the interest cost cost to customers or department stores or Capital One? Then once I identified the problem type I would apply a framework. For example to answer question 1) I would apply the profitability framework. I'm assuming linear profit function. Profit can be segmented into total revenue and total cost. Revenue then can be segmented into revenue's per unit and number of unit's sold. I would ask the information about the revenue's per unit and the number of unit's sold. On the cost we can segment it into cost per unit and number of units sold. These also can be segmented into fixed cost and variable cost. I also would ask information about it also. Given these information it is pretty much a straight forward elementary mathematics. The equation might have several variables depending on what the final revenue and cost structure may be. To answer question 2) we know the profit function from answering question 1). Thus we need to know when profit equals zero. If the unknown is Q(=# of units) then solve for Q when profit is nil. An algebra problem. To answer question 3) I would first question what the relationship is for the quantity (or volume). For example for an increment in accounts opened what is the opening bonus? Profit will vary according to this quantity. You can see that profit will vary because the profit equation will NOT BE linear. To answer question 4) if you the relationship of the profit curve given the answer to question 3) you will be able to draw the profit curve on a plane. As quantity (# accounts opened) varies the profit will vary since it is not a constant. To answer question 5) the profit function should be a quadratic function and to take its derivative with respect to price and set it equal to zero, you should be able to find the maximum price. Once this is found you can plug it back to the quantity function and find the maximum quantity. If someone finds errors in my answer please post comments. Less

↳

Case2: This is an interesting case. I would ask the interviewer on where to start. If the interviewer does not provide a starting point then I would apply a general framework. In order to answer the first question of the pros and cons (risk) of the particular situation I would ask the interviewer questions on why this situation occurred and whether find out whether this situation is firm specific or industry wide. In particular, who is the customer? I want to find the segment size, growth rates and market share for different types of customers. I also would ask information about the current year to historical numbers regarding customer segment size, growth rates and market share in order to identify demand trends. Other questions would be the needs of the each customer segments, price they are willing to pay etc. I would like at the company and its capabilities, cost structure and so forth. Hopefully by identifying key trends we are able to make a score card, then count the pros and cons and make a statement of whether there are advantages and what are the caveats. Calculating the profitability I would use the profitability framework where profit = total revenue - total cost and break down the segments there and ask information regarding to the each revenue/cost segments. Once those are known calculating profit is straight forward. For the next question, I would evaluate each options separately. Offering free months to increase quantity demand paribus ceteris will have effects to the profit function. Then find the maximum profit. I would also do the same analysis for option number 2. If maximum profit for option1 is greater than option2 then assuming the CEO is rational option1 would be better way to increase profit and vice versa. For the last question, I'm not sure what the original proposal is but assuming that we are conducting a break even point analysis regarding option1 and option2 we may look at this as a system of equations. Set option1 profit equation to nil and likewise to option2 profit function. Since both equations are equal to zero the equations are equal to each other. Then we can solve for the unknown variable, quantity, that would satisfy both equations. This is the point where both curves cross. If someone finds errors in my answer please post comments. Less

↳

What topics are covered during the screening maths?Do you recall the 5 math questions? Less