MiLMo:Minority Multilingual Pre-trained Language Model
Junjie Deng, Hanru Shi, Xinhe Yu, Wugedele Bao, Yuan Sun, Xiaobing, Zhao

TL;DR
MiLMo is a new multilingual pre-trained language model specifically designed to improve performance on minority languages like Mongolian, Tibetan, Uyghur, Kazakh, and Korean, addressing resource scarcity issues.
Contribution
The paper introduces MiLMo, a multilingual pre-trained model tailored for minority languages, along with a new dataset MiTC and a comparative analysis with word2vec models.
Findings
MiLMo outperforms word2vec in minority language text classification.
The model achieves the best results on the MiTC dataset.
Resources are publicly available at http://milmo.cmli-nlp.com/.
Abstract
Pre-trained language models are trained on large-scale unsupervised data, and they can fine-turn the model only on small-scale labeled datasets, and achieve good results. Multilingual pre-trained language models can be trained on multiple languages, and the model can understand multiple languages at the same time. At present, the search on pre-trained models mainly focuses on rich resources, while there is relatively little research on low-resource languages such as minority languages, and the public multilingual pre-trained language model can not work well for minority languages. Therefore, this paper constructs a multilingual pre-trained model named MiLMo that performs better on minority language tasks, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. To solve the problem of scarcity of datasets on minority languages and verify the effectiveness of the MiLMo model, this paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
