Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space

Thanh-Tuan Tran; Thanh Nguyen Canh; Nak Young Chong; Xiem HoangVan

arXiv:2603.01302·cs.RO·March 3, 2026

Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space

Thanh-Tuan Tran, Thanh Nguyen Canh, Nak Young Chong, Xiem HoangVan

PDF

Open Access

TL;DR

This paper introduces Hybrid TD3, a reinforcement learning algorithm designed for hybrid action spaces, providing theoretical analysis of overestimation bias and demonstrating improved stability and performance in robotic manipulation tasks.

Contribution

Hybrid TD3 extends TD3 to handle hybrid action spaces natively, with a novel bias reduction technique and comprehensive theoretical analysis of overestimation bias.

Findings

01

Hybrid TD3 reduces overestimation bias in hybrid action spaces.

02

The method achieves more stable training compared to existing approaches.

03

Experimental results show competitive performance in robotic manipulation tasks.

Abstract

Reinforcement learning in discrete-continuous hybrid action spaces presents fundamental challenges for robotic manipulation, where high-level task decisions and low-level joint-space execution must be jointly optimized. Existing approaches either discretize continuous components or relax discrete choices into continuous approximations, which suffer from scalability limitations and training instability in high-dimensional action spaces and under domain randomization. In this paper, we propose Hybrid TD3, an extension of Twin Delayed Deep Deterministic Policy Gradient (TD3) that natively handles parameterized hybrid action spaces in a principled manner. We conduct a rigorous theoretical analysis of overestimation bias in hybrid action settings, deriving formal bounds under twin-critic architectures and establishing a complete bias ordering across five algorithmic variants. Building on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning