Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Pawe{\l} M\k{a}ka, Yusuf Can Semerci, Jan Scholtes, Gerasimos, Spanakis

TL;DR
This paper investigates how attention heads in context-aware machine translation models contribute to pronoun disambiguation, revealing underutilized heads and demonstrating performance improvements through targeted fine-tuning.
Contribution
It provides a detailed analysis of attention head roles in pronoun disambiguation and shows that fine-tuning specific heads can significantly enhance translation accuracy.
Findings
Some attention heads attend relevant relations but do not influence disambiguation.
Certain heads are underutilized, indicating room for performance improvement.
Fine-tuning key heads increases pronoun disambiguation accuracy by up to 5 percentage points.
Abstract
In this paper, we investigate the role of attention heads in Context-aware Machine Translation models for pronoun disambiguation in the English-to-German and English-to-French language directions. We analyze their influence by both observing and modifying the attention scores corresponding to the plausible relations that could impact a pronoun prediction. Our findings reveal that while some heads do attend the relations of interest, not all of them influence the models' ability to disambiguate pronouns. We show that certain heads are underutilized by the models, suggesting that model performance could be improved if only the heads would attend one of the relations more strongly. Furthermore, we fine-tune the most promising heads and observe the increase in pronoun disambiguation accuracy of up to 5 percentage points which demonstrates that the improvements in performance can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need
