TL;DR
FaithLens is a novel, cost-effective model that detects faithfulness hallucinations in LLM outputs and provides explanations, enhancing trustworthiness across diverse tasks.
Contribution
Introduces FaithLens, a fine-tuned, reinforcement learning-optimized model that jointly predicts hallucinations and explains them, outperforming larger models.
Findings
FaithLens outperforms GPT-5.2 and o3 on 12 tasks.
It provides high-quality explanations alongside predictions.
The model balances trustworthiness, efficiency, and effectiveness.
Abstract
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
