Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and   Few-Shot Learning

Shaohua Wu; Xudong Zhao; Tong Yu; Rongguo Zhang; Chong Shen; Hongli; Liu; Feng Li; Hong Zhu; Jiangang Luo; Liang Xu; Xuanwei Zhang

arXiv:2110.04725·cs.CL·October 13, 2021·25 cites

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

Shaohua Wu, Xudong Zhao, Tong Yu, Rongguo Zhang, Chong Shen, Hongli, Liu, Feng Li, Hong Zhu, Jiangang Luo, Liang Xu, Xuanwei Zhang

PDF

Open Access 1 Repo 3 Models

TL;DR

Yuan 1.0 is a large-scale pre-trained language model with 245 billion parameters that achieves state-of-the-art results in zero-shot and few-shot NLP tasks, utilizing innovative training, data filtering, and calibration techniques.

Contribution

The paper introduces Yuan 1.0, the largest singleton language model, with novel methods for distributed training, data processing, and performance calibration to enhance zero-shot and few-shot learning.

Findings

01

Yuan 1.0 achieves state-of-the-art NLP performance.

02

Efficient data filtering enables building a 5TB high-quality Chinese corpus.

03

Calibration methods improve zero-shot and few-shot task accuracy.

Abstract

Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of computation. However, training a model like GPT-3 requires huge amount of computational resources which makes it challengeable to researchers. In this work, we propose a method that incorporates large-scale distributed training performance into model architecture design. With this method, Yuan 1.0, the current largest singleton language model with 245B parameters, achieves excellent performance on thousands GPUs during training, and the state-of-the-art results on NLP tasks. A data processing method is designed to efficiently filter massive amount of raw data. The current largest high-quality Chinese corpus with 5TB high quality texts is built based on this method. In addition, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Shawn-Inspur/Yuan-1.0
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Linear Warmup With Cosine Annealing · Softmax