WeLM: A Well-Read Pre-trained Language Model for Chinese
Hui Su, Xiao Zhou, Houjin Yu, Xiaoyu Shen, Yuwen Chen, Zilin Zhu, Yang, Yu, Jie Zhou

TL;DR
WeLM is a large-scale Chinese pre-trained language model with 10 billion parameters that excels in zero-shot, few-shot, and multi-task learning, outperforming smaller models and existing multilingual models across various Chinese and multilingual tasks.
Contribution
The paper introduces WeLM, a 10B-parameter Chinese language model trained on a high-quality corpus, demonstrating superior zero-shot and few-shot performance and multi-lingual capabilities, with novel prompt fine-tuning and interpretability features.
Findings
Outperforms existing models on 18 Chinese tasks
Matches larger models' performance with fewer parameters
Excels in multilingual and code-switching understanding
Abstract
Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks. In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations. WeLM is trained with 10B parameters by "reading" a curated high-quality corpus covering a wide range of topics. We show that WeLM is equipped with broad knowledge on various domains and languages. On 18 monolingual (Chinese) tasks, WeLM can significantly outperform existing pre-trained models with similar sizes and match the performance of models up to 25 times larger. WeLM also exhibits strong capabilities in multi-lingual and code-switching understanding, outperforming existing multilingual language models pre-trained on 30 languages. Furthermore, We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
