TL;DR
This paper presents fs1, a method that enhances LLM factuality by grounding reasoning traces in knowledge graph paths, leading to improved performance on complex QA tasks.
Contribution
Introducing fs1, a technique that fine-tunes LLMs with knowledge graph-grounded reasoning traces, significantly boosting factual reasoning accuracy.
Findings
fs1-tuned models outperform instruction-tuned models by 6-14 points on pass@16.
Significant improvements on questions requiring 3+ hops and numerical answers.
Smaller LLMs benefit most from the fs1 approach in single-pass inference.
Abstract
We introduce fs1, a simple yet effective method that improves the factuality of reasoning traces by collecting them from large reasoning models and grounding them in knowledge graph (KG) paths. We fine-tune eight instruction-tuned Large Language Models (LLMs) on 3.9K factually grounded reasoning traces and rigorously evaluate them on six complex open-domain question-answering (QA) benchmarks encompassing 23.9K questions. Our results demonstrate that our fs1-tuned model consistently outperforms instruction-tuned counterparts with parallel sampling by 6-14 absolute points (pass@16). Our detailed analysis shows that fs1 considerably improves model performance over more complex questions (requiring 3 or more hops on KG paths) and numerical answer types compared to the baselines. Furthermore, in single-pass inference, we notice that smaller LLMs show the most improvements. While prior works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗jjzha/Qwen2.5-0.5B-Instruct-fs1-2708model· 2 dl2 dl
- 🤗jjzha/Qwen2.5-1.5B-Instruct-fs1-2708model· 5 dl5 dl
- 🤗jjzha/Qwen2.5-3B-Instruct-fs1-2708model· 3 dl3 dl
- 🤗jjzha/Qwen2.5-7B-Instruct-fs1-2708model· 5 dl· ♡ 15 dl♡ 1
- 🤗jjzha/Qwen2.5-14B-Instruct-fs1-2708model· 1 dl1 dl
- 🤗jjzha/Qwen2.5-32B-Instruct-fs1-2708model
- 🤗jjzha/Qwen2.5-0.5B-Instruct-rt-2708model· 3 dl3 dl
- 🤗jjzha/Qwen2.5-1.5B-Instruct-rt-2708model· 4 dl4 dl
- 🤗jjzha/Qwen2.5-3B-Instruct-rt-2708model· 7 dl· ♡ 17 dl♡ 1
- 🤗jjzha/Qwen2.5-7B-Instruct-rt-2708model· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
