ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan, Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua, Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan, Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang

TL;DR
ERNIE 3.0 is a large-scale, knowledge-enhanced pre-trained model that combines auto-regressive and auto-encoding architectures, achieving state-of-the-art results in NLP tasks and surpassing human performance on SuperGLUE.
Contribution
The paper introduces ERNIE 3.0, a unified framework that integrates knowledge enhancement with a hybrid training architecture for improved NLP performance.
Findings
Outperforms state-of-the-art models on 54 Chinese NLP tasks.
Achieves first place on SuperGLUE benchmark, surpassing human performance.
Trained with 10 billion parameters on a large corpus including knowledge graphs.
Abstract
Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗swtx/ernie-3.0-base-chinesemodel· 82 dl· ♡ 1782 dl♡ 17
- 🤗nghuyong/ernie-3.0-base-zhmodel· 3.8k dl· ♡ 1123.8k dl♡ 112
- 🤗nghuyong/ernie-3.0-medium-zhmodel· 1.2k dl· ♡ 81.2k dl♡ 8
- 🤗nghuyong/ernie-3.0-mini-zhmodel· 205 dl· ♡ 3205 dl♡ 3
- 🤗nghuyong/ernie-3.0-micro-zhmodel· 13 dl· ♡ 213 dl♡ 2
- 🤗nghuyong/ernie-3.0-nano-zhmodel· 398 dl· ♡ 25398 dl♡ 25
- 🤗IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinesemodel· 5 dl· ♡ 65 dl♡ 6
- 🤗IDEA-CCNL/Erlangshen-UniMC-RoBERTa-330M-Chinesemodel· 20 dl· ♡ 320 dl♡ 3
- 🤗IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinesemodel· 31 dl· ♡ 831 dl♡ 8
- 🤗nghuyong/ernie-3.0-xbase-zhmodel· 13k dl· ♡ 2313k dl♡ 23
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · ERNIE · Linear Layer · Weight Decay · Cosine Annealing · Gated Linear Unit · Inverse Square Root Schedule · {Dispute@FaQ-s}How to file a dispute with Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia?
