Constrained Group Relative Policy Optimization

Roger Girgis; Rodrigue de Schaetzen; Luke Rowe; Azal\'ee Robitaille; Christopher Pal; Liam Paull

arXiv:2602.05863·cs.LG·February 9, 2026

Constrained Group Relative Policy Optimization

Roger Girgis, Rodrigue de Schaetzen, Luke Rowe, Azal\'ee Robitaille, Christopher Pal, Liam Paull

PDF

Open Access

TL;DR

This paper extends Group Relative Policy Optimization to handle explicit behavioral constraints using a Lagrangian approach, addressing optimization issues with advantage estimation and demonstrating improved constraint satisfaction in robotics tasks.

Contribution

Introduces Constrained GRPO, a Lagrangian-based extension that effectively enforces behavioral constraints in policy learning, with a novel advantage scalarization method.

Findings

01

Scalarized advantage preserves constraint enforcement.

02

Constrained GRPO improves constraint satisfaction in robotics tasks.

03

Naive advantage estimation can break constrained learning.

Abstract

While Group Relative Policy Optimization (GRPO) has emerged as a scalable framework for critic-free policy learning, extending it to settings with explicit behavioral constraints remains underexplored. We introduce Constrained GRPO, a Lagrangian-based extension of GRPO for constrained policy optimization. Constraints are specified via indicator cost functions, enabling direct optimization of violation rates through a Lagrangian relaxation. We show that a naive multi-component treatment in advantage estimation can break constrained learning: mismatched component-wise standard deviations distort the relative importance of the different objective terms, which in turn corrupts the Lagrangian signal and prevents meaningful constraint enforcement. We formally derive this effect to motivate our scalarized advantage construction that preserves the intended trade-off between reward and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research