GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei, Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

TL;DR
GanLM introduces a GAN-inspired encoder-decoder pre-training framework with an auxiliary discriminator, enhancing both language understanding and generation, leading to state-of-the-art results in NLP benchmarks.
Contribution
The paper presents GanLM, a novel pre-training model combining GAN principles with encoder-decoder architecture for improved NLP performance.
Findings
GanLM outperforms existing pre-trained models on language generation benchmarks.
The auxiliary discriminator enhances language understanding and generation capabilities.
GanLM achieves state-of-the-art results in multiple NLP tasks.
Abstract
Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
