Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge

Zhuoyi Yang; Yurun Song; Iftekhar Ahmed; Ian Harris

arXiv:2601.07054·cs.CL·January 13, 2026

Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge

Zhuoyi Yang, Yurun Song, Iftekhar Ahmed, Ian Harris

PDF

Open Access

TL;DR

This paper compares fine-tuning and retrieval-augmented generation methods for multi-hop question answering, revealing that retrieval-based approaches excel especially with temporally novel knowledge, while supervised fine-tuning achieves the highest accuracy overall.

Contribution

It systematically evaluates parametric and non-parametric knowledge injection methods on open-source LLMs for multi-hop QA, highlighting the effectiveness of retrieval-augmented generation.

Findings

01

Retrieval-augmented generation significantly improves accuracy with novel knowledge.

02

Supervised fine-tuning achieves the highest overall accuracy.

03

Unsupervised fine-tuning offers limited gains over base models.

Abstract

Multi-hop question answering is widely used to evaluate the reasoning capabilities of large language models (LLMs), as it requires integrating multiple pieces of supporting knowledge to arrive at a correct answer. While prior work has explored different mechanisms for providing knowledge to LLMs, such as finetuning and retrieval-augmented generation (RAG), their relative effectiveness for multi-hop question answering remains insufficiently understood, particularly when the required knowledge is temporally novel. In this paper, we systematically compare parametric and non-parametric knowledge injection methods for open-domain multi-hop question answering. We evaluate unsupervised fine-tuning (continual pretraining), supervised fine-tuning, and retrieval-augmented generation across three 7B-parameter open-source LLMs. Experiments are conducted on two benchmarks: QASC, a standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems