Task Loss Estimation for Sequence Prediction

Dzmitry Bahdanau; Dmitriy Serdyuk; Phil\'emon Brakel; Nan Rosemary Ke,; Jan Chorowski; Aaron Courville; Yoshua Bengio

arXiv:1511.06456·cs.LG·January 20, 2016·28 cites

Task Loss Estimation for Sequence Prediction

Dzmitry Bahdanau, Dmitriy Serdyuk, Phil\'emon Brakel, Nan Rosemary Ke,, Jan Chorowski, Aaron Courville, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for creating differentiable surrogate losses that are consistent with task losses in sequence prediction, demonstrated by a 13% CER improvement in speech recognition without additional language data.

Contribution

It proposes a new approach to derive surrogate losses based on task loss estimation, ensuring consistency and improving sequence prediction models.

Findings

01

Significant ~13% CER reduction in speech recognition.

02

Surrogate loss derived from task loss estimation improves model training.

03

Method ensures the surrogate loss is consistent with the task loss.

Abstract

Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call emph{consistency with the task loss}. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rizar/attention-lvcsr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis