REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality   Hallucination Detection in Large Language Models

DongGeon Lee; Hwanjo Yu

arXiv:2502.13622·cs.CL·April 9, 2025

REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

DongGeon Lee, Hwanjo Yu

PDF

Open Access

TL;DR

REFIND is a retrieval-augmented framework that detects hallucinated spans in large language model outputs by leveraging retrieved documents and a novel sensitivity metric, improving reliability across multiple languages.

Contribution

The paper introduces REFIND, a new retrieval-augmented method with the Context Sensitivity Ratio for effective hallucination detection in LLM outputs across diverse languages.

Findings

01

Outperformed baseline models in hallucination detection accuracy.

02

Demonstrated robustness across nine languages, including low-resource settings.

03

Achieved superior IoU scores in identifying hallucinated spans.

Abstract

Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval-augmented Factuality hallucINation Detection), a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents. As part of the REFIND, we propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. This innovative approach enables REFIND to efficiently and accurately detect hallucinations, setting it apart from existing methods. In the evaluation, REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models, achieving superior IoU scores in identifying hallucinated spans. This work highlights the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData-Driven Disease Surveillance