Machine Translation Hallucination Detection for Low and High Resource   Languages using Large Language Models

Kenza Benkirane; Laura Gongas; Shahar Pelles; Naomi Fuchs; Joshua; Darmon; Pontus Stenetorp; David Ifeoluwa Adelani; Eduardo S\'anchez

arXiv:2407.16470·cs.CL·October 22, 2024·1 cites

Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

Kenza Benkirane, Laura Gongas, Shahar Pelles, Naomi Fuchs, Joshua, Darmon, Pontus Stenetorp, David Ifeoluwa Adelani, Eduardo S\'anchez

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper evaluates the effectiveness of Large Language Models in detecting hallucinations in machine translation across high- and low-resource languages, highlighting model choice impacts and performance variations.

Contribution

It systematically compares LLM-based hallucination detection methods across 16 languages, revealing insights into model performance differences for resource levels.

Findings

01

Llama3-70B outperforms previous methods on high-resource languages.

02

Claude Sonnet performs best on low-resource languages.

03

LLMs can match or surpass specialized models despite no explicit MT training.

Abstract

Recent advancements in massively multilingual machine translation systems have significantly enhanced translation accuracy; however, even the best performing systems still generate hallucinations, severely impacting user trust. Detecting hallucinations in Machine Translation (MT) remains a critical challenge, particularly since existing methods excel with High-Resource Languages (HRLs) but exhibit substantial limitations when applied to Low-Resource Languages (LRLs). This paper evaluates sentence-level hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. Our study spans 16 language directions, covering HRLs, LRLs, with diverse scripts. We find that the choice of model is essential for performance. On average, for HRLs, Llama3-70B outperforms the previous state of the art by as much as 0.16 MCC (Matthews…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kenza-ily/mt_hallucination_detection
noneOfficial

Videos

Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models· underline

Taxonomy

TopicsText Readability and Simplification