TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman; Nan Ding; Radu Soricut

arXiv:2010.03494·cs.CL·October 12, 2020

TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman, Nan Ding, Radu Soricut

PDF

TL;DR

TeaForN introduces a novel teacher-forcing method using N-grams that improves sequence generation by addressing exposure bias and differentiability issues, demonstrating enhanced performance on translation and summarization benchmarks.

Contribution

The paper proposes TeaForN, a versatile teacher-forcing approach with N-grams that directly tackles exposure bias and differentiability problems in sequence models.

Findings

01

Improves translation quality on WMT 2014 English-French

02

Enhances summarization results on CNN/Dailymail and Gigaword

03

Requires minimal modifications to standard teacher-forcing setups

Abstract

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.