Understanding and Detecting Hallucinations in Neural Machine Translation   via Model Introspection

Weijia Xu; Sweta Agrawal; Eleftheria Briakou; Marianna J. Martindale,; Marine Carpuat

arXiv:2301.07779·cs.CL·February 28, 2023·6 cites

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale,, Marine Carpuat

PDF

Open Access 1 Repo

TL;DR

This paper investigates the internal signals of neural machine translation models that indicate hallucinations, and develops a lightweight detector that outperforms existing methods in identifying hallucinated outputs across multiple language pairs.

Contribution

It introduces a novel analysis of model internals to detect hallucinations and proposes an effective, lightweight hallucination detection method for neural machine translation.

Findings

01

Internal model symptoms reliably indicate hallucinations

02

The proposed detector outperforms baselines and large pre-trained models

03

Effective across multiple language pairs and datasets

Abstract

Neural sequence generation models are known to "hallucinate", by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weijia-xu/hallucinations-in-nmt
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsTest