Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning
Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, and Satyasaran Changdar

TL;DR
This paper extends Group Relative Policy Optimization to continuous control tasks in robotics, providing a theoretical framework that addresses high-dimensional actions, sparse rewards, and temporal dynamics, with convergence analysis and future empirical validation plans.
Contribution
It introduces a novel theoretical framework for applying GRPO to continuous control, including trajectory clustering and advantage estimation tailored for robotics.
Findings
Theoretical analysis confirms convergence properties.
Framework addresses high-dimensional and sparse reward challenges.
Lays foundation for future robotic system experiments.
Abstract
Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored, limiting its utility in robotics where continuous actions are essential. This paper presents a theoretical framework extending GRPO to continuous control environments, addressing challenges in high-dimensional action spaces, sparse rewards, and temporal dynamics. Our approach introduces trajectory-based policy clustering, state-aware advantage estimation, and regularized policy updates designed for robotic applications. We provide theoretical analysis of convergence properties and computational complexity, establishing a foundation for future empirical validation in robotic systems including locomotion and manipulation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
