Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models
Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee,, Yoonna Jang, Jaehyung Seo, Heuiseok Lim

TL;DR
This paper investigates cross-lingual post-training (XPT) to improve language transfer for resource-scarce languages, demonstrating that XPT can outperform monolingual models even with minimal data, especially for typologically distant languages like Korean.
Contribution
The study provides an in-depth analysis of XPT's effectiveness for low-resource, typologically distant languages, focusing on Korean, which is rarely explored in detail.
Findings
XPT outperforms monolingual models with much less data.
XPT is highly efficient in transfer learning for low-resource languages.
Results show strong transfer performance for Korean, a language isolate.
Abstract
As pre-trained language models become more resource-demanding, the inequality between resource-rich languages such as English and resource-scarce languages is worsening. This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution. Some research areas attempt to mitigate this problem. For example, in cross-lingual transfer learning and multilingual training, the goal is to benefit long-tail languages via the knowledge acquired from resource-rich languages. Although being successful, existing work has mainly focused on experimenting on as many languages as possible. As a result, targeted in-depth analysis is mostly absent. In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
