Truth Knows No Language: Evaluating Truthfulness Beyond English
Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria de-Dios-Flores, Rodrigo Agerri

TL;DR
This paper extends the TruthfulQA benchmark to multiple languages, evaluates LLM truthfulness across them, and finds translation and resource levels influence performance, with LLM-as-a-Judge aligning better with human judgments.
Contribution
It introduces a multilingual truthfulness benchmark, compares LLMs across languages, and demonstrates the effectiveness of machine translation for extending evaluations.
Findings
LLMs perform best in English, worst in Basque
LLM-as-a-Judge correlates more with human judgments
Machine translation is a viable method for multilingual benchmarking
Abstract
We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations of large language models (LLMs) have primarily been conducted in English. However, the ability of LLMs to maintain truthfulness across languages remains under-explored. Our study evaluates 12 state-of-the-art open LLMs, comparing base and instruction-tuned models using human evaluation, multiple-choice metrics, and LLM-as-a-Judge scoring. Our findings reveal that, while LLMs perform best in English and worst in Basque (the lowest-resourced language), overall truthfulness discrepancies across languages are smaller than anticipated. Furthermore, we show that LLM-as-a-Judge correlates more closely with human judgments than multiple-choice metrics, and that informativeness plays a critical role in truthfulness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDeception detection and forensic psychology · Interpreting and Communication in Healthcare · Epistemology, Ethics, and Metaphysics
MethodsBalanced Selection
