Go Vivace interview question

Why and when do we use multi-headed attention module in Natural Language Processing