From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs

Jie He; Victor Guti\'errez-Basulto; Jeff Z. Pan

arXiv:2507.22716·cs.CL·August 7, 2025

From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs

Jie He, Victor Guti\'errez-Basulto, Jeff Z. Pan

PDF

TL;DR

This paper introduces TIRESRAG-R1, a reinforcement learning framework that improves reasoning quality in retrieval-augmented LLMs by addressing common failure patterns through a multi-dimensional reward system and reflection strategies.

Contribution

It proposes a novel think-retrieve-reflect framework with a multi-dimensional reward system to enhance reasoning and stability in retrieval-augmented LLMs.

Findings

01

Outperforms prior RAG methods on multi-hop QA datasets

02

Generalizes well to single-hop tasks

03

Improves reasoning stability and accuracy

Abstract

Reinforcement learning-based retrieval-augmented generation (RAG) methods enhance the reasoning abilities of large language models (LLMs). However, most rely only on final-answer rewards, overlooking intermediate reasoning quality. This paper analyzes existing RAG reasoning models and identifies three main failure patterns: (1) information insufficiency, meaning the model fails to retrieve adequate support; (2) faulty reasoning, where logical or content-level flaws appear despite sufficient information; and (3) answer-reasoning inconsistency, where a valid reasoning chain leads to a mismatched final answer. We propose TIRESRAG-R1, a novel framework using a think-retrieve-reflect process and a multi-dimensional reward system to improve reasoning and stability. TIRESRAG-R1 introduces: (1) a sufficiency reward to encourage thorough retrieval; (2) a reasoning quality reward to assess the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.