Calibration of Encoder Decoder Models for Neural Machine Translation

Aviral Kumar; Sunita Sarawagi

arXiv:1903.00802·cs.LG·March 6, 2019·40 cites

Calibration of Encoder Decoder Models for Neural Machine Translation

Aviral Kumar, Sunita Sarawagi

PDF

Open Access

TL;DR

This paper investigates the calibration issues in neural machine translation models, identifies key causes of miscalibration, and proposes recalibration methods that improve confidence estimates and inference quality.

Contribution

It introduces novel recalibration techniques addressing EOS and attention uncertainty, enhancing model calibration and beam-search performance in NMT systems.

Findings

01

Improved sequence-level calibration accuracy

02

Enhanced beam-search inference results

03

Reduced EOS miscalibration and attention suppression

Abstract

We study the calibration of several state of the art neural machine translation(NMT) systems built on attention-based encoder-decoder models. For structured outputs like in NMT, calibration is important not just for reliable confidence with predictions, but also for proper functioning of beam-search inference. We show that most modern NMT models are surprisingly miscalibrated even when conditioned on the true previous tokens. Our investigation leads to two main reasons -- severe miscalibration of EOS (end of sequence marker) and suppression of attention uncertainty. We design recalibration methods based on these signals and demonstrate improved accuracy, better sequence-level calibration, and more intuitive results from beam-search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications