Curriculum-Based Neighborhood Sampling For Sequence Prediction
James O' Neill, Danushka Bollegala

TL;DR
This paper introduces a curriculum learning approach called Nearest-Neighbor Replacement Sampling to reduce exposure bias in language models, improving multi-step prediction accuracy by gradually introducing stochasticity during training.
Contribution
It proposes a novel curriculum learning method that replaces inputs with similar neighbors to better handle exposure bias in sequence prediction models.
Findings
Improves performance on language modeling benchmarks.
Works well with scheduled sampling to reduce compounding errors.
Requires minimal additional memory.
Abstract
The task of multi-step ahead prediction in language models is challenging considering the discrepancy between training and testing. At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. In order to improve generalization in neural language models and address compounding errors, we propose a curriculum learning based method that gradually changes an initially deterministic teacher policy to a gradually more stochastic policy, which we refer to as \textit{Nearest-Neighbor Replacement Sampling}. A chosen input at a given timestep is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
