Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification
Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury,, Josef van Genabith, Cristina Espa\~na-Bonet

TL;DR
This paper compares traditional feature-engineering methods with neural feature-learning approaches for multilingual translationese classification, showing neural models outperform and revealing differences in feature importance.
Contribution
It provides a comprehensive comparison between handcrafted features and neural architectures, highlighting neural models' superior performance and analyzing feature importance across models.
Findings
Neural architectures outperform feature-engineering approaches by over 20 accuracy points.
BERT-based models perform best in both monolingual and multilingual settings.
Multilingual experiments support the existence of translationese universals across languages.
Abstract
Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts. By contrast, to date, neural architectures without manual feature engineering have been less explored for this task. In this work, we (i) compare the traditional feature-engineering-based approach to the feature-learning-based one and (ii) analyse the neural architectures in order to investigate how well the hand-crafted features explain the variance in the neural models' predictions. We use pre-trained neural word embeddings, as well as several end-to-end neural architectures in both monolingual and multilingual settings and compare them to feature-engineering-based SVM classifiers. We show that (i) neural architectures outperform other approaches by more than 20 accuracy points, with the BERT-based model performing the best in both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsSupport Vector Machine
