Do LLM hallucination detectors suffer from low-resource effect?
Debtanu Datta, Mohan Kishore Chilukuri, Yash Kumar, Saptarshi Ghosh, Muhammad Bilal Zafar

TL;DR
This paper investigates whether hallucination detectors for large language models are affected by the low-resource language performance drop, finding that detectors are more robust than the models themselves across different languages and settings.
Contribution
The study reveals that hallucination detectors maintain relatively stable accuracy in low-resource languages, indicating they encode signals about model uncertainty beyond raw task performance.
Findings
Detectors are more robust than LLMs in low-resource languages.
Detector accuracy drops less than task accuracy in low-resource settings.
Robustness of detectors is limited in cross-lingual transfer without supervision.
Abstract
LLMs, while outperforming humans in a wide range of tasks, can still fail in unanticipated ways. We focus on two pervasive failure modes: (i) hallucinations, where models produce incorrect information about the world, and (ii) the low-resource effect, where the models show impressive performance in high-resource languages like English but the performance degrades significantly in low-resource languages like Bengali. We study the intersection of these issues and ask: do hallucination detectors suffer from the low-resource effect? We conduct experiments on five tasks across three domains (factual recall, STEM, and Humanities). Experiments with four LLMs and three hallucination detectors reveal a curious finding: As expected, the task accuracies in low-resource languages experience large drops (compared to English). However, the drop in detectors' accuracy is often several times smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Multimodal Machine Learning Applications
