GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Jian Yang; Shuming Ma; Li Dong; Shaohan Huang; Haoyang Huang; Yuwei; Yin; Dongdong Zhang; Liqun Yang; Furu Wei; Zhoujun Li

arXiv:2212.10218·cs.CL·May 10, 2023·1 cites

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei, Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

PDF

Open Access 1 Repo

TL;DR

GanLM introduces a GAN-inspired encoder-decoder pre-training framework with an auxiliary discriminator, enhancing both language understanding and generation, leading to state-of-the-art results in NLP benchmarks.

Contribution

The paper presents GanLM, a novel pre-training model combining GAN principles with encoder-decoder architecture for improved NLP performance.

Findings

01

GanLM outperforms existing pre-trained models on language generation benchmarks.

02

The auxiliary discriminator enhances language understanding and generation capabilities.

03

GanLM achieves state-of-the-art results in multiple NLP tasks.

Abstract

Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csjianyang/ganlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling