MUCH: A Multilingual Claim Hallucination Benchmark
J\'er\'emie Dentan, Alexi Canesse, Davide Buscaldi, Aymen Shabou, Sonia Vanier

TL;DR
This paper introduces MUCH, a comprehensive multilingual benchmark for claim-level uncertainty quantification in LLMs, featuring a large dataset, detailed logits, and a fast segmentation algorithm to evaluate and improve UQ methods.
Contribution
We present the first multilingual claim-level UQ benchmark with detailed logits and a real-time capable segmentation algorithm, enabling fair and reproducible evaluation of future methods.
Findings
Current UQ methods show significant room for improvement.
The benchmark supports fair comparison across languages.
Our segmentation algorithm is efficient and suitable for real-time applications.
Abstract
Claim-level Uncertainty Quantification (UQ) is a promising approach to mitigate the lack of reliability in Large Language Models (LLMs). We introduce MUCH, the first claim-level UQ benchmark designed for fair and reproducible evaluation of future methods under realistic conditions. It includes 4,873 samples across four European languages (English, French, Spanish, and German) and four instruction-tuned open-weight LLMs. Unlike prior claim-level benchmarks, we release 24 generation logits per token, facilitating the development of future white-box methods without re-generating data. Moreover, in contrast to previous benchmarks that rely on manual or LLM-based segmentation, we propose a new deterministic algorithm capable of segmenting claims using as little as 0.2% of the LLM generation time. This makes our segmentation approach suitable for real-time monitoring of LLM outputs, ensuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Big Data and Digital Economy
