From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs
Jie He, Victor Guti\'errez-Basulto, Jeff Z. Pan

TL;DR
This paper introduces TIRESRAG-R1, a reinforcement learning framework that improves reasoning quality in retrieval-augmented LLMs by addressing common failure patterns through a multi-dimensional reward system and reflection strategies.
Contribution
It proposes a novel think-retrieve-reflect framework with a multi-dimensional reward system to enhance reasoning and stability in retrieval-augmented LLMs.
Findings
Outperforms prior RAG methods on multi-hop QA datasets
Generalizes well to single-hop tasks
Improves reasoning stability and accuracy
Abstract
Reinforcement learning-based retrieval-augmented generation (RAG) methods enhance the reasoning abilities of large language models (LLMs). However, most rely only on final-answer rewards, overlooking intermediate reasoning quality. This paper analyzes existing RAG reasoning models and identifies three main failure patterns: (1) information insufficiency, meaning the model fails to retrieve adequate support; (2) faulty reasoning, where logical or content-level flaws appear despite sufficient information; and (3) answer-reasoning inconsistency, where a valid reasoning chain leads to a mismatched final answer. We propose TIRESRAG-R1, a novel framework using a think-retrieve-reflect process and a multi-dimensional reward system to improve reasoning and stability. TIRESRAG-R1 introduces: (1) a sufficiency reward to encourage thorough retrieval; (2) a reasoning quality reward to assess the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
