E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

TL;DR
E2S2 introduces an encoding-focused self-supervised pretraining strategy for seq2seq models, enhancing their language understanding and generation capabilities by improving encoder representations.
Contribution
The paper proposes a novel encoding-enhanced pretraining method, E2S2, which integrates denoising and contrastive objectives into the encoder to improve seq2seq model performance.
Findings
Achieves +1.1% on GLUE benchmark
Improves F0.5 score by 1.75% on CoNLL2014
Enhances linguistic representations in seq2seq models
Abstract
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect of encoder-side supervision, which we argue may lead to sub-optimal performance. To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation. Therefore, we propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2, which improves the seq2seq models via integrating more efficient self-supervised information into the encoders. Specifically, E2S2 adopts two self-supervised objectives on the encoder side from two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Inverse Square Root Schedule · Gated Linear Unit · Adafactor · Attention Dropout · SentencePiece · T5 · Linear Layer
