TeaForN: Teacher-Forcing with N-grams
Sebastian Goodman, Nan Ding, Radu Soricut

TL;DR
TeaForN introduces a novel teacher-forcing method using N-grams that improves sequence generation by addressing exposure bias and differentiability issues, demonstrating enhanced performance on translation and summarization benchmarks.
Contribution
The paper proposes TeaForN, a versatile teacher-forcing approach with N-grams that directly tackles exposure bias and differentiability problems in sequence models.
Findings
Improves translation quality on WMT 2014 English-French
Enhances summarization results on CNN/Dailymail and Gigaword
Requires minimal modifications to standard teacher-forcing setups
Abstract
Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
