WeLM: A Well-Read Pre-trained Language Model for Chinese

Hui Su; Xiao Zhou; Houjin Yu; Xiaoyu Shen; Yuwen Chen; Zilin Zhu; Yang; Yu; Jie Zhou

arXiv:2209.10372·cs.CL·May 17, 2023·6 cites

WeLM: A Well-Read Pre-trained Language Model for Chinese

Hui Su, Xiao Zhou, Houjin Yu, Xiaoyu Shen, Yuwen Chen, Zilin Zhu, Yang, Yu, Jie Zhou

PDF

Open Access

TL;DR

WeLM is a large-scale Chinese pre-trained language model with 10 billion parameters that excels in zero-shot, few-shot, and multi-task learning, outperforming smaller models and existing multilingual models across various Chinese and multilingual tasks.

Contribution

The paper introduces WeLM, a 10B-parameter Chinese language model trained on a high-quality corpus, demonstrating superior zero-shot and few-shot performance and multi-lingual capabilities, with novel prompt fine-tuning and interpretability features.

Findings

01

Outperforms existing models on 18 Chinese tasks

02

Matches larger models' performance with fewer parameters

03

Excels in multilingual and code-switching understanding

Abstract

Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks. In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations. WeLM is trained with 10B parameters by "reading" a curated high-quality corpus covering a wide range of topics. We show that WeLM is equipped with broad knowledge on various domains and languages. On 18 monolingual (Chinese) tasks, WeLM can significantly outperform existing pre-trained models with similar sizes and match the performance of models up to 25 times larger. WeLM also exhibits strong capabilities in multi-lingual and code-switching understanding, outperforming existing multilingual language models pre-trained on 30 languages. Furthermore, We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications