Analysing Coreference in Transformer Outputs

Ekaterina Lapshinova-Koltunski; Cristina Espa\~na-Bonet; Josef van; Genabith

arXiv:1911.01188·cs.CL·November 5, 2019

Analysing Coreference in Transformer Outputs

Ekaterina Lapshinova-Koltunski, Cristina Espa\~na-Bonet, Josef van, Genabith

PDF

Open Access

TL;DR

This paper investigates how neural machine translation systems handle coreference phenomena across different genres and data settings, revealing potential translationese effects and areas for improvement in coreference translation accuracy.

Contribution

It provides a detailed analysis of coreference translation in neural MT outputs, introducing an error typology and comparing system outputs to source and human translations.

Findings

01

Stronger translationese effects observed in machine translations.

02

Coreference errors include incorrect word choices and missing words.

03

Analysis highlights differences between genres and data settings.

Abstract

We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information. We compare system performance on two different genres: news and TED talks. To do this, we manually annotate (the possibly incorrect) coreference chains in the MT outputs and evaluate the coreference chain translations. We define an error typology that aims to go further than pronoun translation adequacy and includes types such as incorrect word selection or missing words. The features of coreference chains in automatic translations are also compared to those of the source texts and human translations. The analysis shows stronger potential translationese effects in machine translated outputs than in human translations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification