Rethinking Exposure Bias In Language Modeling
Yifan Xu, Kening Zhang, Haoyu Dong, Yuezhou Sun, Wenlong Zhao, Zhuowen, Tu

TL;DR
This paper addresses exposure bias in language models by introducing strategies to enhance reward signals, leading to improved BLEU scores and robustness as measured by a new metric called road exam.
Contribution
It proposes two novel strategies, multi-range reinforcing and multi-entropy sampling, to amplify and denoise reward signals in RL and GAN training for language models.
Findings
Improved BLEU scores over competing models.
Enhanced robustness as measured by the road exam metric.
Effective mitigation of exposure bias through proposed strategies.
Abstract
Exposure bias describes the phenomenon that a language model trained under the teacher forcing schema may perform poorly at the inference stage when its predictions are conditioned on its previous predictions unseen from the training corpus. Recently, several generative adversarial networks (GANs) and reinforcement learning (RL) methods have been introduced to alleviate this problem. Nonetheless, a common issue in RL and GANs training is the sparsity of reward signals. In this paper, we adopt two simple strategies, multi-range reinforcing, and multi-entropy sampling, to amplify and denoise the reward signal. Our model produces an improvement over competing models with regards to BLEU scores and road exam, a new metric we designed to measure the robustness against exposure bias in language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
