Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation
Nuno M. Guerreiro, Pierre Colombo, Pablo Piantanida, Andr\'e F. T., Martins

TL;DR
This paper introduces an unsupervised optimal transport-based method for detecting hallucinations in neural machine translation, leveraging attention patterns to distinguish pathological outputs without requiring labeled data.
Contribution
It proposes a novel, fully unsupervised detector using optimal transport to identify hallucinations in NMT, outperforming previous model-based approaches.
Findings
Outperforms all previous model-based hallucination detectors
Competitive with large-model detectors trained on millions of samples
Effective across different attention-based NMT models
Abstract
Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can unpredictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
