SRR-Judge: Step-Level Rating and Refinement for Enhancing Search-Integrated Reasoning in Search Agents
Chen Zhang, Kuicai Dong, Dexun Li, Wenjun Li, Qu Yang, Wei Han, Yong Liu

TL;DR
SRR-Judge introduces a step-level assessment framework for search agents, improving reasoning quality and performance by guiding and refining their actions through fine-grained evaluations.
Contribution
It presents SRR-Judge, a novel step-level rating system integrated into search agents, enabling more reliable reasoning assessment and effective fine-tuning for better performance.
Findings
SRR-Judge outperforms larger models in step evaluation accuracy.
Aligning policies with SRR-Judge annotations improves search agent performance.
Achieves over 10% improvement in pass@1 on deep search benchmarks.
Abstract
Recent deep search agents built on large reasoning models (LRMs) excel at complex question answering by iteratively planning, acting, and gathering evidence, a capability known as search-integrated reasoning. However, mainstream approaches often train this ability using only outcome-based supervision, neglecting the quality of intermediate thoughts and actions. We introduce SRR-Judge, a framework for reliable step-level assessment of reasoning and search actions. Integrated into a modified ReAct-style rate-and-refine workflow, SRR-Judge provides fine-grained guidance for search-integrated reasoning and enables efficient post-training annotation. Using SRR-annotated data, we apply an iterative rejection sampling fine-tuning procedure to enhance the deep search capability of the base agent. Empirically, SRR-Judge delivers more reliable step-level evaluations than much larger models such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Information Retrieval and Search Behavior
