Comparing Feature-Engineering and Feature-Learning Approaches for   Multilingual Translationese Classification

Daria Pylypenko; Kwabena Amponsah-Kaakyire; Koel Dutta Chowdhury,; Josef van Genabith; Cristina Espa\~na-Bonet

arXiv:2109.07604·cs.CL·September 17, 2021

Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury,, Josef van Genabith, Cristina Espa\~na-Bonet

PDF

Open Access

TL;DR

This paper compares traditional feature-engineering methods with neural feature-learning approaches for multilingual translationese classification, showing neural models outperform and revealing differences in feature importance.

Contribution

It provides a comprehensive comparison between handcrafted features and neural architectures, highlighting neural models' superior performance and analyzing feature importance across models.

Findings

01

Neural architectures outperform feature-engineering approaches by over 20 accuracy points.

02

BERT-based models perform best in both monolingual and multilingual settings.

03

Multilingual experiments support the existence of translationese universals across languages.

Abstract

Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts. By contrast, to date, neural architectures without manual feature engineering have been less explored for this task. In this work, we (i) compare the traditional feature-engineering-based approach to the feature-learning-based one and (ii) analyse the neural architectures in order to investigate how well the hand-crafted features explain the variance in the neural models' predictions. We use pre-trained neural word embeddings, as well as several end-to-end neural architectures in both monolingual and multilingual settings and compare them to feature-engineering-based SVM classifiers. We show that (i) neural architectures outperform other approaches by more than 20 accuracy points, with the BERT-based model performing the best in both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSupport Vector Machine