CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language   Understanding and Generation

Yunfan Shao; Zhichao Geng; Yitao Liu; Junqi Dai; Hang Yan; Fei Yang,; Li Zhe; Hujun Bao; Xipeng Qiu

arXiv:2109.05729·cs.CL·July 19, 2022

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Hang Yan, Fei Yang,, Li Zhe, Hujun Bao, Xipeng Qiu

PDF

1 Repo 5 Models

TL;DR

CPT is a novel Chinese pre-trained unbalanced Transformer that effectively combines understanding and generation tasks, improving performance and efficiency through shared architecture and multi-task pre-training.

Contribution

The paper introduces CPT, a unbalanced Transformer with shared encoder and task-specific decoders, enabling efficient multi-task learning for Chinese NLP tasks.

Findings

01

Outperforms previous models on Chinese NLU and NLG tasks

02

Reduces computational and storage costs

03

Accelerates inference speed for text generation

Abstract

In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fastnlp/cpt
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Chinese Pre-trained Unbalanced Transformer · Dropout · Dense Connections · Label Smoothing · Residual Connection