Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

Matias Alvo; Daniel Russo; Yash Kanoria

arXiv:2605.14297·cs.LG·May 15, 2026

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

Matias Alvo, Daniel Russo, Yash Kanoria

PDF

1 Repo

TL;DR

This paper introduces Hybrid Policy Optimization (HPO), a new reinforcement learning method for hybrid discrete-continuous action spaces that combines unbiased mixed gradients and outperforms existing algorithms like PPO in complex control tasks.

Contribution

HPO effectively combines pathwise and score-function gradients for hybrid actions, addressing credit-assignment issues and enabling scalable, unbiased policy optimization in hybrid spaces.

Findings

01

HPO outperforms PPO on inventory control and switched LQ problems.

02

Performance gaps increase with higher continuous action dimensions.

03

The mixed gradient's cross term diminishes near a discrete best response, enabling decentralized updates.

Abstract

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MatiasAlvo/hybrid-rl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.