$k$-Neighbor Based Curriculum Sampling for Sequence Prediction
James O' Neill, Danushka Bollegala

TL;DR
This paper introduces a curriculum learning method called Nearest-Neighbor Replacement Sampling to reduce exposure bias in language models, improving multi-step prediction accuracy by exploring similar token alternatives during training.
Contribution
It proposes a novel, simple, online sampling technique that replaces tokens with similar neighbors to enhance generalization and reduce errors in sequence prediction models.
Findings
Improves language model performance on benchmarks.
Enhances robustness when combined with scheduled sampling.
Requires minimal additional memory.
Abstract
Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the compounding of errors along a generated sequence at test time. To improve generalization in neural language models and address compounding errors, we propose \textit{Nearest-Neighbor Replacement Sampling} -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy to a stochastic policy. A token at a given time-step is replaced with a sampled nearest neighbor of the past target with a truncated probability proportional to the cosine similarity between the original word and its top most similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
