Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation

Zhan Peng Lee; Andre Lin; Calvin Tan

arXiv:2505.10792·cs.CL·December 4, 2025

Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation

Zhan Peng Lee, Andre Lin, Calvin Tan

PDF

Open Access 1 Repo 4 Models 1 Datasets

TL;DR

Finetune-RAG introduces a fine-tuning method with a new dataset to enhance the factual accuracy of retrieval-augmented language models, effectively reducing hallucinations caused by retrieval errors.

Contribution

The paper presents Finetune-RAG, a novel fine-tuning approach and a real-world imperfect retrieval dataset, improving factual accuracy and providing an evaluation pipeline for RAG models.

Findings

01

Finetune-RAG improves factual accuracy by 21.2%.

02

The new dataset mimics real-world retrieval imperfections.

03

Bench-RAG effectively stress tests models under imperfect retrieval scenarios.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful framework to improve factuality in large language models (LLMs) by grounding their outputs in retrieved documents. However, ensuring perfect retrieval of relevant information remains challenging, and when irrelevant content is passed downstream to an LLM, it can lead to hallucinations. In this work, we propose Finetune-RAG, a simple and effective fine-tuning approach that features the first-of-its-kind RAG training dataset constructed to mimic real-world imperfections. Experimental results show that Finetune-RAG improves factual accuracy by 21.2% over the base model. We also propose Bench-RAG, an LLM-as-a-judge evaluation pipeline that stress tests models under realistic imperfect retrieval scenarios. Our codebase and dataset are fully open sourced for community use.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Pints-AI/Finetune-Bench-RAG
pytorchOfficial

Models

Datasets

pints-ai/Finetune-RAG
dataset· 70 dl
70 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Misinformation and Its Impacts · Mental Health via Writing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Attention Dropout · WordPiece · Residual Connection · Linear Layer · Byte Pair Encoding