3 step interview, 1st interview with head of team, after that with various engineers in the team. I did not go past the 2nd interview. They asked me a bunch of questions on matrix multiplication, which concerned cache locality.
Interview questions [1]
Question 1
My background, also some parallel computing/cuda questions.