Using CollGram to Compare Formulaic Language in Human and Neural Machine   Translation

Yves Bestgen

arXiv:2107.03625·cs.CL·July 26, 2021

Using CollGram to Compare Formulaic Language in Human and Neural Machine Translation

Yves Bestgen

PDF

Open Access

TL;DR

This paper compares formulaic language in human and neural machine translation of news articles, revealing significant differences in frequency and association of formulaic sequences, with implications for translation quality and system performance.

Contribution

It introduces CollGram as a tool to analyze and compare formulaic sequences in human and neural machine translations, highlighting systematic differences and system variations.

Findings

01

Neural machine translations contain fewer low-frequency, strongly-associated formulaic sequences.

02

Neural machine translations have more high-frequency formulaic sequences.

03

Differences are statistically significant with medium or large effect sizes.

Abstract

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences, and more high-frequency formulaic sequences. These differences were statistically significant and the effect sizes were almost always medium or large. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more formulaic sequences of both types than other systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling