An Empirical Study of Factors Affecting Language-Independent Models
Xiaotong Liu, Yingbei Tong, Anbang Xu, Rama Akkiraju

TL;DR
This study empirically examines factors influencing language-independent models in NLP, demonstrating their effectiveness across tasks, languages, and resource levels, with particular success in typologically similar and low-resource languages.
Contribution
It provides a comprehensive empirical analysis of factors affecting multilingual models, highlighting their competitive performance and applicability across diverse languages and data resource scenarios.
Findings
Language-independent models can outperform monolingual models in certain tasks.
They are especially effective for typologically similar languages.
Models perform well even with limited data in low-resource languages.
Abstract
Scaling existing applications and solutions to multiple human languages has traditionally proven to be difficult, mainly due to the language-dependent nature of preprocessing and feature engineering techniques employed in traditional approaches. In this work, we empirically investigate the factors affecting language-independent models built with multilingual representations, including task type, language set and data resource. On two most representative NLP tasks -- sentence classification and sequence labeling, we show that language-independent models can be comparable to or even outperforms the models trained using monolingual data, and they are generally more effective on sentence classification. We experiment language-independent models with many different languages and show that they are more suitable for typologically similar languages. We also explore the effects of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
