Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation
Xiang Lin, Prathyusha Jwalapuram, Shafiq Joty

TL;DR
This paper introduces DySI, a dynamic scheduled sampling method with imitation loss that improves neural text generation by reducing exposure bias and enhancing robustness across various models and tasks.
Contribution
The paper proposes a universally applicable dynamic scheduled sampling approach that relies on training accuracy and incorporates imitation loss to improve text generation quality.
Findings
Achieves notable improvements on machine translation benchmarks.
Significantly enhances robustness of text generation models.
Requires minimal tuning across different training setups.
Abstract
State-of-the-art neural text generation models are typically trained to maximize the likelihood of each token in the ground-truth sequence conditioned on the previous target tokens. However, during inference, the model needs to make a prediction conditioned on the tokens generated by itself. This train-test discrepancy is referred to as exposure bias. Scheduled sampling is a curriculum learning strategy that gradually exposes the model to its own predictions during training to mitigate this bias. Most of the proposed approaches design a scheduler based on training steps, which generally requires careful tuning depending on the training setup. In this work, we introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy, while enhancing the curriculum learning by introducing an imitation loss, which attempts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
