Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs
Samir Abdaljalil, Parichit Sharma, Erchin Serpedin, Hasan Kurban

TL;DR
Halluverse-M^3 is a comprehensive multilingual benchmark dataset designed to analyze and evaluate hallucinations in large language models across multiple languages, tasks, and hallucination types, aiding future research in detection and mitigation.
Contribution
This work introduces Halluverse-M^3, a novel dataset enabling systematic analysis of hallucinations across languages, tasks, and categories, with human-validated, controlled hallucinated outputs.
Findings
Question answering is easier than dialogue summarization for hallucination detection.
Sentence-level hallucinations are more challenging to detect than entity or relation-level.
Detection accuracy decreases in lower-resource languages, especially Hindi.
Abstract
Hallucinations in large language models remain a persistent challenge, particularly in multilingual and generative settings where factual consistency is difficult to maintain. While recent models show strong performance on English-centric benchmarks, their behavior across languages, tasks, and hallucination types is not yet well understood. In this work, we introduce Halluverse-M^3, a dataset designed to enable systematic analysis of hallucinations across multiple languages, multiple generation tasks, and multiple hallucination categories. Halluverse-M^3 covers four languages, English, Arabic, Hindi, and Turkish, and supports two generation tasks: question answering and dialogue summarization. The dataset explicitly distinguishes between entity-level, relation-level, and sentence-level hallucinations. Hallucinated outputs are constructed through a controlled editing process and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Schizophrenia research and treatment · Psychedelics and Drug Studies
