Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning

Rajat Khanda; Mohammad Baqar; Sambuddha Chakrabarti; and Satyasaran Changdar

arXiv:2507.19555·cs.RO·July 29, 2025

Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning

Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, and Satyasaran Changdar

PDF

TL;DR

This paper extends Group Relative Policy Optimization to continuous control tasks in robotics, providing a theoretical framework that addresses high-dimensional actions, sparse rewards, and temporal dynamics, with convergence analysis and future empirical validation plans.

Contribution

It introduces a novel theoretical framework for applying GRPO to continuous control, including trajectory clustering and advantage estimation tailored for robotics.

Findings

01

Theoretical analysis confirms convergence properties.

02

Framework addresses high-dimensional and sparse reward challenges.

03

Lays foundation for future robotic system experiments.

Abstract

Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored, limiting its utility in robotics where continuous actions are essential. This paper presents a theoretical framework extending GRPO to continuous control environments, addressing challenges in high-dimensional action spaces, sparse rewards, and temporal dynamics. Our approach introduces trajectory-based policy clustering, state-aware advantage estimation, and regularized policy updates designed for robotic applications. We provide theoretical analysis of convergence properties and computational complexity, establishing a foundation for future empirical validation in robotic systems including locomotion and manipulation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.