Data mining engineer Interview Questions
57
Data Mining Engineer interview questions shared by candidates
List the strings that are anagrams from a set of strings?
2 Answers↳
Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Less
↳
sort the strings and compare

How would you design a recommendation system (like amazon)?
2 Answers↳
Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Less
↳
Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Less

Implement a sampling function with nominal distribution.
2 Answers↳
I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Less
↳
I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Less

Only one easy/medium leetcode question during the coding module.
1 Answers↳
I got the optimal solution (with a couple nudges but time to spare), yet apparently this was the only module where I did not "meet expectations." Shame that some presumably small mistake in my first hour was enough to discount the otherwise very strong 6 hour interview. Less



Difference between l1 and l2 regularization.
1 Answers↳
Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights Less

Design a recommendation system???
1 Answers↳
It depends on the volume of data that we have. Assuming there is a lot of data on hand, it is best to use a Collaborative filtering. This involves finding similar users/items for whom we are recommending products and implement a weighted average of their likeliness to the product to help make a decision on recommending the product. This could be implemented as a user-user collaborative filtering where we find similar users or an item-item collaborative filtering. If we have fewer data to work with, it is a better idea to implement a Content-based filtering approach where we create profiles for the users and try to recommend products based on the features of the user profiles. Less


Find the point where the sum of distance to all other points is minimized.
1 Answers↳
The closest point to the mean of all the points.