Data Scientist Interview Questions in United States | Glassdoor.co.in

# Data Scientist Interview Questions in United States

2,770

Data scientist interview questions shared by candidates

## Top Interview Questions

Sort: RelevancePopular Date

12 Sep 2013

25 Feb 2012
 Find the second largest element in a Binary Search Tree16 Answersfind the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child.One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch.if (root == null || (!root.hasRightChild() ) { return null;} else return findSecondGreatest(root, root.getValue()); value findSecondGreatest(Node curr, value oldValue) { if(curr.hasRightChild()) { return (findSecondGreatest( curr.getRightChild(), curr.value)); } else return oldValue; }Show more responsesAbove answer is wrong. it has to be something like this. public static int findSecondLargest(Node node) { Node secondLargest = null; Node parent = null; Node child = node; if (node!=null && (node.hasLeftChild()||node.hasRightChild())) { if (node.hasRightChild()) { while (child.hasRightChild()) { parent = child; child = child.rightChild(); } secondLargest = parent; } else if (node.hasLeftChild()) { child = node.leftChild(); while (child.hasRightChild()) { child = child.rightChild(); } secondLargest = child; } } return secondLargest; }The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; }Soln by "mindpower" works. Thank you. I am trying to solve a similar problem Find the 2nd nearest high(in in-order traversal) value for a given node Eg: Given nums: 12 7 14 3, construct a BST. If the given value is: 7 then we should return 14 (in the sort order: 3, 7, 12, 14) if the given value is: 3 then we should return 12 (in the sort order: 3, 7, 12, 14)Generic solution in C# for any k. Notice that this example can be easily changed to find the k-th smallest node by doing a depth-first recursion on root.Left first, and then a tail recursion on root.Right. public Node GetKthLargest(int k) { return GetKthLargest(ref k, this.Root); } Node GetKthLargest(ref int k, Node root) { if (root == null || k < 1) return null; var node = GetKthLargest(ref k, root.Right); if (node != null) return node; if (--k == 0) return root; return GetKthLargest(ref k, root.Left); }recursion is not needed. SecondLargest(Node root, Node secondLarge) { if(root.right==null) return root.left; Node secondLargest = root; while(secondLargest.right.right==null) secondLargest=secondLargest.right; return secondLargest; }int getmax(node *root) { if(root->right == NULL) { return root->d; } return getmax(root->right); } int secondmax(node *root) { if(root == NULL) { return -1; } if(root->right == NULL && root->left != NULL) { return getmax(root->left); } if(root->right != NULL) { if(root->right->right == NULL && root->right->left == NULL) { return root->d; } } return secondmax(root->right); }In-order traverse the tree. The second last element in the array in the answer.In Python: def find_second_largest_bst_element(root, parent=None): if parent is None: # BST root if root.right is None: # no right subtree if root.left is not None: # if a left subtree exists... return root.left else: # root is the only element of the BST return False else: if root.right is None: # right-most element if root.left is not None: # left subtree exists return root.left else: # leaf return parent else: # check right subtree find_second_largest_bst_element(root.right, root) find_second_largest_bst_element(root)For kth smallest, descend the left subtree first. class Node: def __init__(self, value, left=None, right=None): self.value = value self.left = left self.right = right def findKthLargest(root, k): global count if root is None: return findKthLargest(root.right, k) count += 1 if count == k: print root.value return findKthLargest(root.left, k) count = 0 r = Node(10, Node(5, Node(2), Node(7)), Node(30, Node(22), Node(32))) findKthLargest(r, 3)// solution in java // main routine Node findSecondMax(Node root) { if(root == null || (root.left == null && root.right == null) return null; else { Node max = findMax(root); return (max.parent == null) ? findMax(max.left) : max.parent; } } //helper routine, recursive implementation.... can also be done non-recursively Node findMax(Node root) { return (root.right == null) ? root : findMax(root.right); }Show more responsesFind the largest number in the binary tree and delete it. And again find the largest number. Short and fast.Reverse in-order traversal of the BST, keeping a count of # of visited nodes. This methods works great to return the kth largest element in a BST.mindpower's solution looks right

17 Jan 2018

26 May 2013
 Write a function that takes in two sorted lists and outputs a sorted list that is their union.10 Answersf(a,b) { return sort(unique(a,b)) }def sortedUnion(list1,list2): list3 = [x for x in list1 if x in list2] return sorted(list(set(list3)))google merge sortShow more responseswrite 2 helpers: 1) INSERT(A, b) = put element b within A in the sort order 2) DEL(A, a) = delete element a from A Then do this recursion: f(A,B) : if max(A) <= min(B) return [A B] else { B = INSERT(B, max(a)); A = DEL(A, max(a); f(A,B); } something like that. try coding and testing. I haven't.Oops, check/write a termination conditionOn Python, you could do: from sets import Set def merge_sort(a,b): return sorted( Set(a).union(Set(b)) )def sorted_union(list1, list2): union=set(list1).union(set(list2)) sorted_union=sorted(list(union)) return sorted_unionSecond part of merge sort. Don't answer with sort(a), etc. Anyone can do that... def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_listI assumed that we can not use any "sort" function and we want it with linear time. so here it is: def my_sort(list_a, list_b): if len(list_a) ==0: return list_b elif len(list_b) ==0: return list_a else: if list_a[-1] > list_b[-1]: return( my_sort(list_a[0:-1], list_b) + [list_a.pop(-1)]) else: return(my_sort(list_a,list_b[:-1]) + [list_b.pop(-1)])In SQL SELECT List1 FROM Table1 UNION SELECT List2 FROM Table2 ORDER BY List1, List2;

### Software Engineer at Two Sigma was asked...

22 Apr 2012
 how to compress a prefix tree?2 AnswersNot sure what this is looking for without some background. Check out this link though, it should lead you down the correct path: http://crpit.com/confpapers/CRPITV17Sucahyo.pdfThe above PDF link should be for researchers. It is a bit overwhelming for most software developers. I guess the interviewer wants you to discuss something like "directed acyclic word graph" or DAWG. Simply put, DAWG not only uses prefixes like trie, but also uses suffix to compress data. Here is wikipedia link: http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

16 Feb 2012
 generating a sorted vector from two sorted vectors. 3 Answerskeep two pointers and compare the two numbers they point to. Move the pointer which points to the smaller or equal number. End loop when two pointers reach the end.look at merge in mergesort, does exact same thing.Merge sort is the best...many languages have this function inbuilt...else this can also be done manually, assume two vectors A [1,2,3,4] And B[5,6,7,8]...merge them...compare the last value of A and first value of B...in our case 4<5 is true...thus the result...if it is false then move the number up and then compare it with the previous number and so on...

### Data Scientist at Square was asked...

1 Mar 2013
 How do you test whether a new credit risk scoring model works? What data would you look at?4 AnswersI think I did fairly well on the data side, but I think I should have connected this to a model or something. Not fully sure on this one.One could use the machine learning concept known as cross validation as an element to solve for this case... Assuming that in the development of the model, borrower data has already been broken into several subsets (a training, a validation, and a test set) and part of this subset data has already been used to fit and tune the model (the training and validation sets), the test set can then be used to provide an unbiased and independent assessment of the model's performance. In this case, we would be interested in comparing the MSE's of both the training and test sets - which should be roughly equivalent if the model is good.An ideal model will have lowest sum of bias square and variance. If the model already has lowest expected error comparing to other models, it is the only choice for a working modelShow more responsesF1 score, look at bias vs variance trade-off and optimization, run it on a percentage of new data coming in. This EXACT question is on InterviewQuery.com.

### Data Scientist at BlackLocus was asked...

6 Sep 2019
 Describe your experience with optimization algorithms and process.1 AnswerThis was a bizarre question, and the interviewer persisted in asking follow-ups, even though I explained my experience with optimization was using portfolio optimization packages for the investment industry. He explained operations research was is graduate training, so in effect the question was more about him than me. Also, when I told him I was having a hard time understanding the conversation, he said was speaking from a computer and the company wifi "was bad, and not good for audio conversations" ( ! ).

3 Mar 2019