Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge
Zhuoyi Yang, Yurun Song, Iftekhar Ahmed, Ian Harris

TL;DR
This paper compares fine-tuning and retrieval-augmented generation methods for multi-hop question answering, revealing that retrieval-based approaches excel especially with temporally novel knowledge, while supervised fine-tuning achieves the highest accuracy overall.
Contribution
It systematically evaluates parametric and non-parametric knowledge injection methods on open-source LLMs for multi-hop QA, highlighting the effectiveness of retrieval-augmented generation.
Findings
Retrieval-augmented generation significantly improves accuracy with novel knowledge.
Supervised fine-tuning achieves the highest overall accuracy.
Unsupervised fine-tuning offers limited gains over base models.
Abstract
Multi-hop question answering is widely used to evaluate the reasoning capabilities of large language models (LLMs), as it requires integrating multiple pieces of supporting knowledge to arrive at a correct answer. While prior work has explored different mechanisms for providing knowledge to LLMs, such as finetuning and retrieval-augmented generation (RAG), their relative effectiveness for multi-hop question answering remains insufficiently understood, particularly when the required knowledge is temporally novel. In this paper, we systematically compare parametric and non-parametric knowledge injection methods for open-domain multi-hop question answering. We evaluate unsupervised fine-tuning (continual pretraining), supervised fine-tuning, and retrieval-augmented generation across three 7B-parameter open-source LLMs. Experiments are conducted on two benchmarks: QASC, a standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems
