ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language   Understanding and Generation

Yu Sun; Shuohuan Wang; Shikun Feng; Siyu Ding; Chao Pang; Junyuan; Shang; Jiaxiang Liu; Xuyi Chen; Yanbin Zhao; Yuxiang Lu; Weixin Liu; Zhihua; Wu; Weibao Gong; Jianzhong Liang; Zhizhou Shang; Peng Sun; Wei Liu; Xuan; Ouyang; Dianhai Yu; Hao Tian; Hua Wu; Haifeng Wang

arXiv:2107.02137·cs.CL·July 6, 2021·194 cites

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan, Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua, Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan, Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang

PDF

Open Access 2 Repos 10 Models

TL;DR

ERNIE 3.0 is a large-scale, knowledge-enhanced pre-trained model that combines auto-regressive and auto-encoding architectures, achieving state-of-the-art results in NLP tasks and surpassing human performance on SuperGLUE.

Contribution

The paper introduces ERNIE 3.0, a unified framework that integrates knowledge enhancement with a hybrid training architecture for improved NLP performance.

Findings

01

Outperforms state-of-the-art models on 54 Chinese NLP tasks.

02

Achieves first place on SuperGLUE benchmark, surpassing human performance.

03

Trained with 10 billion parameters on a large corpus including knowledge graphs.

Abstract

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · ERNIE · Linear Layer · Weight Decay · Cosine Annealing · Gated Linear Unit · Inverse Square Root Schedule · {Dispute@FaQ-s}How to file a dispute with Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia?