Sequence Level Training with Recurrent Neural Networks

Marc'Aurelio Ranzato; Sumit Chopra; Michael Auli; Wojciech Zaremba

arXiv:1511.06732·cs.LG·May 10, 2016·ICLR·946 cites

Sequence Level Training with Recurrent Neural Networks

Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

PDF

Open Access 5 Repos

TL;DR

This paper introduces a sequence level training method for RNN-based language models that directly optimizes test-time metrics like BLEU or ROUGE, improving generation quality and speed.

Contribution

It proposes a novel sequence level training algorithm that enhances text generation by aligning training objectives with test-time evaluation metrics.

Findings

01

Outperforms strong baselines in three NLP tasks.

02

Achieves better results with greedy decoding.

03

Faster than beam search methods.

Abstract

Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We address this issue by proposing a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. On three different tasks, our approach outperforms several strong baselines for greedy generation. The method is also competitive when these baselines employ beam search, while being several times faster.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications