ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning
Jingyue Gao, Yanjiang Guo, Xiaoshuai Chen, Jianyu Chen

TL;DR
ProCeedRL introduces a process-level critic and reflection-based demonstrations to enhance exploration and reasoning in large language model agents for complex tasks.
Contribution
It presents a novel exploration strategy that actively intervenes using a process critic and demonstrations, improving over standard passive exploration methods.
Findings
ProCeedRL outperforms standard exploration strategies in complex tasks.
The approach significantly improves exploration efficiency and reasoning accuracy.
It achieves superior performance on deep search and embodied tasks.
Abstract
Reinforcement Learning (RL) significantly enhances the reasoning abilities of large language models (LLMs), yet applying it to multi-turn agentic tasks remains challenging due to the long-horizon nature of interactions and the stochasticity of environmental feedback. We identify a structural failure mode in agentic exploration: suboptimal actions elicit noisy observations into misleading contexts, which further weaken subsequent decision-making, making recovery increasingly difficult. This cumulative feedback loop of errors renders standard exploration strategies ineffective and susceptible to the model's reasoning and the environment's randomness. To mitigate this issue, we propose ProCeedRL: Process Critic with Explorative Demonstration RL, shifting exploration from passive selection to active intervention. ProCeedRL employs a process-level critic to monitor interactions in real time,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
