Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments
Bryan L. M. de Oliveira, Felipe V. Frujeri, Marcos P. C. M. Queiroz, Luana G. B. Martins, Telma W. de L. Soares, Luckeciano C. Melo

TL;DR
This paper systematically evaluates Group Relative Policy Optimization (GRPO) in classical reinforcement learning, revealing that critics are crucial for long-horizon tasks and highlighting conditions where critic-free methods are viable.
Contribution
First comprehensive study of GRPO in classical RL environments, identifying when critic-free approaches succeed or fail compared to traditional methods.
Findings
Critics are essential for long-horizon tasks like HalfCheetah.
GRPO performs well with high discount factors in certain environments.
Smaller group sizes outperform larger ones in batch grouping strategies.
Abstract
Group Relative Policy Optimization (GRPO) has emerged as a scalable alternative to Proximal Policy Optimization (PPO) by eliminating the learned critic and instead estimating advantages through group-relative comparisons of trajectories. This simplification raises fundamental questions about the necessity of learned baselines in policy-gradient methods. We present the first systematic study of GRPO in classical single-task reinforcement learning environments, spanning discrete and continuous control tasks. Through controlled ablations isolating baselines, discounting, and group sampling, we reveal three key findings: (1) learned critics remain essential for long-horizon tasks: all critic-free baselines underperform PPO except in short-horizon environments like CartPole where episodic returns can be effective; (2) GRPO benefits from high discount factors (gamma = 0.99) except in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
