Data Scientist Interview Questions in San Jose, CA | Glassdoor.co.in

# Data Scientist Interview Questions in San Jose, CA, US

233

Data scientist interview questions shared by candidates

## Top Interview Questions

Sort: RelevancePopular Date

25 Feb 2012
 Find the second largest element in a Binary Search Tree16 Answersfind the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child.One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch.if (root == null || (!root.hasRightChild() ) { return null;} else return findSecondGreatest(root, root.getValue()); value findSecondGreatest(Node curr, value oldValue) { if(curr.hasRightChild()) { return (findSecondGreatest( curr.getRightChild(), curr.value)); } else return oldValue; }Show more responsesAbove answer is wrong. it has to be something like this. public static int findSecondLargest(Node node) { Node secondLargest = null; Node parent = null; Node child = node; if (node!=null && (node.hasLeftChild()||node.hasRightChild())) { if (node.hasRightChild()) { while (child.hasRightChild()) { parent = child; child = child.rightChild(); } secondLargest = parent; } else if (node.hasLeftChild()) { child = node.leftChild(); while (child.hasRightChild()) { child = child.rightChild(); } secondLargest = child; } } return secondLargest; }The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; }Soln by "mindpower" works. Thank you. I am trying to solve a similar problem Find the 2nd nearest high(in in-order traversal) value for a given node Eg: Given nums: 12 7 14 3, construct a BST. If the given value is: 7 then we should return 14 (in the sort order: 3, 7, 12, 14) if the given value is: 3 then we should return 12 (in the sort order: 3, 7, 12, 14)Generic solution in C# for any k. Notice that this example can be easily changed to find the k-th smallest node by doing a depth-first recursion on root.Left first, and then a tail recursion on root.Right. public Node GetKthLargest(int k) { return GetKthLargest(ref k, this.Root); } Node GetKthLargest(ref int k, Node root) { if (root == null || k < 1) return null; var node = GetKthLargest(ref k, root.Right); if (node != null) return node; if (--k == 0) return root; return GetKthLargest(ref k, root.Left); }recursion is not needed. SecondLargest(Node root, Node secondLarge) { if(root.right==null) return root.left; Node secondLargest = root; while(secondLargest.right.right==null) secondLargest=secondLargest.right; return secondLargest; }int getmax(node *root) { if(root->right == NULL) { return root->d; } return getmax(root->right); } int secondmax(node *root) { if(root == NULL) { return -1; } if(root->right == NULL && root->left != NULL) { return getmax(root->left); } if(root->right != NULL) { if(root->right->right == NULL && root->right->left == NULL) { return root->d; } } return secondmax(root->right); }In-order traverse the tree. The second last element in the array in the answer.In Python: def find_second_largest_bst_element(root, parent=None): if parent is None: # BST root if root.right is None: # no right subtree if root.left is not None: # if a left subtree exists... return root.left else: # root is the only element of the BST return False else: if root.right is None: # right-most element if root.left is not None: # left subtree exists return root.left else: # leaf return parent else: # check right subtree find_second_largest_bst_element(root.right, root) find_second_largest_bst_element(root)For kth smallest, descend the left subtree first. class Node: def __init__(self, value, left=None, right=None): self.value = value self.left = left self.right = right def findKthLargest(root, k): global count if root is None: return findKthLargest(root.right, k) count += 1 if count == k: print root.value return findKthLargest(root.left, k) count = 0 r = Node(10, Node(5, Node(2), Node(7)), Node(30, Node(22), Node(32))) findKthLargest(r, 3)// solution in java // main routine Node findSecondMax(Node root) { if(root == null || (root.left == null && root.right == null) return null; else { Node max = findMax(root); return (max.parent == null) ? findMax(max.left) : max.parent; } } //helper routine, recursive implementation.... can also be done non-recursively Node findMax(Node root) { return (root.right == null) ? root : findMax(root.right); }Show more responsesFind the largest number in the binary tree and delete it. And again find the largest number. Short and fast.Reverse in-order traversal of the BST, keeping a count of # of visited nodes. This methods works great to return the kth largest element in a BST.mindpower's solution looks right

16 Feb 2012
 generating a sorted vector from two sorted vectors. 3 Answerskeep two pointers and compare the two numbers they point to. Move the pointer which points to the smaller or equal number. End loop when two pointers reach the end.look at merge in mergesort, does exact same thing.Merge sort is the best...many languages have this function inbuilt...else this can also be done manually, assume two vectors A [1,2,3,4] And B[5,6,7,8]...merge them...compare the last value of A and first value of B...in our case 4<5 is true...thus the result...if it is false then move the number up and then compare it with the previous number and so on...

### Data Scientist at Palo Alto Networks was asked...

27 Apr 2019
 1. What's the relationship between PCA and k-means clustering? 2. What are the requirements for a matrix to represent a kernel? What happens if we run SVM using a 'kernel' that does not satisfy these requirements? 3. Problems using Python lists and dictionaries 4. SQL joins, aggregates (count, sum, avg), and cases 5. If you were given a dataset with [X] features (may be numerical, categorial, etc.) and you want to build a model (to determine fraudulent transactions, say), how would you determine which features are best to use in the model?1 Answer1. Both the output matrix of principal component vectors and the k-means cluster assignment matrix form an orthonormal basis of the resulting space. 2. It must be symmetric and positive semidefinite. SVM may not converge. 3. - 4. - 5. Many possible answers depending on data - could use filter feature selection techniques like LDA or Pearson correlation, embedded techniques like lasso regression, clustering

### Data Scientist at Netflix was asked...

19 Jul 2012
 How do you know if one algorithm is better than other?2 AnswersSynthetic testI would say "define better". It can be better-faster, better-earning more money, better-reducing unsubscribe clicks etc.

14 Mar 2019
 Business sense. A question on how to assess impact of a hypothetical features and possible problems. 1 AnswerI think this is a generic question and can be answered in many ways. Have to be clear on the metric for measuring the outcome and need good justification for that.

### Data Scientist/Computer Vision at Verb Surgical was asked...

29 Jun 2019
 What makes you special -- that makes you stand out over everyone else. Heh heh -- 1 AnswerThere is a lot that makes me a one-off but what was interesting to me was this interviewer who had drunk deep from the affirmative action milkshake and was mediocre at best, could not find anyone good enough for her. These kinds of hypocrites are tiresome and so lacking in self awareness.

13 Aug 2019
 classification vs regression metrics for evaluation how to handle missing, corrupt data segmentation Objective/loss function definitions how do you imagine an ML system, broadcasting in numpy ?1 Answer# KNN implementation # Input data.shape = (1000,10) labels = (1000) Observation = (1,10) K = 5 (1,0,0,1,0,0,1,0,0,0) (1,0,0,0.5,0,0,1,0,0,0) # sqrt((x1-xo1)^2) + (x2-xo2)^2) #for each i, i is in range(0,10) #remove def something_to(): dist = srqt(((data-data_point)*(data-data_point)).sum(axis=1)) Sorted_distances = dist.sort() Top_k = sorted_distances[:k]

### Senior Data Scientist at Publicis Groupe was asked...

7 Dec 2019
 how to design a model for times series data using LSTM?1 Answerexplain projects and background

### Data Scientist at HealthTap was asked...

5 Jan 2014
 There are 25 horses. You can race any 5 of them at once, and all you get is the order they finished. How many races would you need to find the 3 fastest horses?2 AnswersJust search for 25 horses puzzle.Found the solution here -> http://www.programmerinterview.com/index.php/puzzles/25-horses-3-fastest-5-races-puzzle/

### Data Scientist at Guardian Analytics was asked...

4 Nov 2015
 Given an array of integers, find the maximum cumulative sum of a sub-set of the array1 AnswerWrote the answer in one line of R code. The interviewer did not know R so we did the problem in a lower level language. Got follow up questions about big-oh, then worked on optimizing the algorithm. It was a little over the top as it was clear he did not know how to optimize it below an n^2 cost but knew there was a way.
110 of 233 Interview Questions