Generative Bridging Network in Neural Sequence Prediction
Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li,, Ming Zhou

TL;DR
This paper introduces the Generative Bridging Network (GBN), a novel approach that improves sequence prediction by using a bridge distribution to better guide training, reducing overfitting and data sparsity issues.
Contribution
The paper proposes GBN with three variants, extending MLE by incorporating bridge distributions to enhance training effectiveness in sequence prediction tasks.
Findings
GBN improves performance on machine translation and summarization tasks.
Different bridge types influence the generator's confidence and smoothness.
Experimental results show significant gains over strong baselines.
Abstract
In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
