TL;DR
CPT is a novel Chinese pre-trained unbalanced Transformer that effectively combines understanding and generation tasks, improving performance and efficiency through shared architecture and multi-task pre-training.
Contribution
The paper introduces CPT, a unbalanced Transformer with shared encoder and task-specific decoders, enabling efficient multi-task learning for Chinese NLP tasks.
Findings
Outperforms previous models on Chinese NLU and NLG tasks
Reduces computational and storage costs
Accelerates inference speed for text generation
Abstract
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Chinese Pre-trained Unbalanced Transformer · Dropout · Dense Connections · Label Smoothing · Residual Connection
