CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic
Yaocheng Zhang, Haohuan Huang, Zijun Song, Yuanheng Zhu, Qichao Zhang, Zijie Zhao, Dongbin Zhao

TL;DR
CriticSearch introduces a retrospective critic mechanism that provides dense, turn-level feedback to improve training stability and performance of search agents in complex reasoning tasks.
Contribution
It presents a novel fine-grained credit assignment framework using a retrospective critic to enhance search agent training with dense rewards.
Findings
Outperforms existing baselines on multi-hop reasoning benchmarks.
Achieves faster convergence and improved training stability.
Results show higher overall performance in complex question-answering tasks.
Abstract
Tool-Integrated Reasoning (TIR) with search engines enables large language models to iteratively retrieve up-to-date external knowledge, enhancing adaptability and generalization in complex question-answering tasks. However, existing search agent pipelines typically depend on reinforcement learning based optimization, which often suffers from sparse outcome rewards, leading to inefficient exploration and unstable training. We introduce CriticSearch, a fine-grained credit-assignment framework that supplies dense, turn-level feedback via a retrospective critic mechanism. During training, a frozen, asymmetric critique LLM retrospectively evaluates each turn using privileged information from the full trajectory and gold answers, converting these assessments into stable, dense rewards that guide policy improvement. Experimental results across diverse multi-hop reasoning benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
