On the Similarities Between Native, Non-native and Translated Texts
Ella Rabinovich, Sergiu Nisioi, Noam Ordan, Shuly Wintner

TL;DR
This study computationally compares native, non-native, and translated texts, revealing that non-native and translated texts are more similar to each other than to native texts, with some features influenced by native language.
Contribution
It provides a detailed computational analysis showing the relationships and differences among native, non-native, and translated language varieties.
Findings
Non-native and translated texts are more similar to each other than to native texts.
All three text types are easily distinguishable.
Some characteristics depend on native language, others do not.
Abstract
We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
