Task Loss Estimation for Sequence Prediction
Dzmitry Bahdanau, Dmitriy Serdyuk, Phil\'emon Brakel, Nan Rosemary Ke,, Jan Chorowski, Aaron Courville, Yoshua Bengio

TL;DR
This paper introduces a novel method for creating differentiable surrogate losses that are consistent with task losses in sequence prediction, demonstrated by a 13% CER improvement in speech recognition without additional language data.
Contribution
It proposes a new approach to derive surrogate losses based on task loss estimation, ensuring consistency and improving sequence prediction models.
Findings
Significant ~13% CER reduction in speech recognition.
Surrogate loss derived from task loss estimation improves model training.
Method ensures the surrogate loss is consistent with the task loss.
Abstract
Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call emph{consistency with the task loss}. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
