FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Likun Tan; Kuan-Wei Huang; Kevin Wu

arXiv:2507.20930·cs.CL·July 31, 2025

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Likun Tan, Kuan-Wei Huang, Kevin Wu

PDF

TL;DR

This paper introduces FRED, a method for detecting and editing hallucinated, factually incorrect content in financial language models, improving their reliability and trustworthiness in high-stakes domains.

Contribution

It presents a novel framework for detecting and editing hallucinations in language models using a domain-specific error taxonomy and synthetic dataset creation.

Findings

01

Phi-4 model improves detection F1 score by 8% over OpenAI-o3.

02

Phi-4-mini maintains competitive performance with only 4 billion parameters.

03

The approach generalizes to enhance trustworthiness of language models beyond finance.

Abstract

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.