REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models
DongGeon Lee, Hwanjo Yu

TL;DR
REFIND is a retrieval-augmented framework that detects hallucinated spans in large language model outputs by leveraging retrieved documents and a novel sensitivity metric, improving reliability across multiple languages.
Contribution
The paper introduces REFIND, a new retrieval-augmented method with the Context Sensitivity Ratio for effective hallucination detection in LLM outputs across diverse languages.
Findings
Outperformed baseline models in hallucination detection accuracy.
Demonstrated robustness across nine languages, including low-resource settings.
Achieved superior IoU scores in identifying hallucinated spans.
Abstract
Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval-augmented Factuality hallucINation Detection), a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents. As part of the REFIND, we propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. This innovative approach enables REFIND to efficiently and accurately detect hallucinations, setting it apart from existing methods. In the evaluation, REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models, achieving superior IoU scores in identifying hallucinated spans. This work highlights the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance
