Analysing Coreference in Transformer Outputs
Ekaterina Lapshinova-Koltunski, Cristina Espa\~na-Bonet, Josef van, Genabith

TL;DR
This paper investigates how neural machine translation systems handle coreference phenomena across different genres and data settings, revealing potential translationese effects and areas for improvement in coreference translation accuracy.
Contribution
It provides a detailed analysis of coreference translation in neural MT outputs, introducing an error typology and comparing system outputs to source and human translations.
Findings
Stronger translationese effects observed in machine translations.
Coreference errors include incorrect word choices and missing words.
Analysis highlights differences between genres and data settings.
Abstract
We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information. We compare system performance on two different genres: news and TED talks. To do this, we manually annotate (the possibly incorrect) coreference chains in the MT outputs and evaluate the coreference chain translations. We define an error typology that aims to go further than pronoun translation adequacy and includes types such as incorrect word selection or missing words. The features of coreference chains in automatic translations are also compared to those of the source texts and human translations. The analysis shows stronger potential translationese effects in machine translated outputs than in human translations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
