Token-level and sequence-level loss smoothing for RNN language models

Maha Elbayad; Laurent Besacier; Jakob Verbeek

arXiv:1805.05062·cs.CL·May 15, 2018

Token-level and sequence-level loss smoothing for RNN language models

Maha Elbayad, Laurent Besacier, Jakob Verbeek

PDF

Open Access 1 Repo

TL;DR

This paper introduces token-level and sequence-level loss smoothing techniques for RNN language models, addressing exposure bias and output space structure, leading to significant improvements in image captioning and machine translation tasks.

Contribution

It extends reward augmented maximum likelihood with token-level smoothing and proposes improvements to sequence-level smoothing, enhancing model performance.

Findings

01

Token-level and sequence-level smoothing are complementary.

02

Significant improvements in image captioning results.

03

Enhanced machine translation performance.

Abstract

Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from "exposure bias": during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach \ie sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elbayadm/seq2seq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling