UER: An Open-Source Toolkit for Pre-training Models
Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu and, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, Xiaoyong Du

TL;DR
UER is an open-source toolkit that enables flexible assembly and deployment of various pre-training models for NLP, facilitating both reproduction of state-of-the-art models and exploration of new ones.
Contribution
It introduces a modular, assemble-on-demand framework for pre-training NLP models, supporting diverse configurations and enabling rapid development and deployment.
Findings
Built a model zoo with diverse pre-trained models
Achieved new state-of-the-art results on multiple NLP datasets
Demonstrated flexible assembly of pre-training components
Abstract
Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks. While there does not exist a single pre-training model that works best in all cases, it is of necessity to develop a framework that is able to deploy various pre-training models efficiently. For this purpose, we propose an assemble-on-demand pre-training toolkit, namely Universal Encoder Representations (UER). UER is loosely coupled, and encapsulated with rich modules. By assembling modules on demand, users can either reproduce a state-of-the-art pre-training model or develop a pre-training model that remains unexplored. With UER, we have built a model zoo, which contains pre-trained models based on different corpora, encoders, and targets (objectives). With proper pre-trained models, we could achieve new state-of-the-art results on a range of downstream datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗uer/albert-base-chinese-cluecorpussmallmodel· 797 dl· ♡ 39797 dl♡ 39
- 🤗uer/albert-large-chinese-cluecorpussmallmodel· 27 dl· ♡ 427 dl♡ 4
- 🤗uer/bart-base-chinese-cluecorpussmallmodel· 6.6k dl· ♡ 186.6k dl♡ 18
- 🤗uer/chinese_roberta_L-10_H-128model· 3 dl· ♡ 13 dl♡ 1
- 🤗uer/chinese_roberta_L-10_H-256model· 3 dl3 dl
- 🤗uer/chinese_roberta_L-10_H-512model· 2 dl2 dl
- 🤗uer/chinese_roberta_L-10_H-768model· 3 dl· ♡ 23 dl♡ 2
- 🤗uer/chinese_roberta_L-12_H-128model· 3 dl· ♡ 13 dl♡ 1
- 🤗uer/chinese_roberta_L-12_H-256model· 2 dl2 dl
- 🤗uer/chinese_roberta_L-12_H-512model· 4 dl· ♡ 14 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Weight Decay · Residual Connection · Adam · Layer Normalization · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
