Hybrid TD3: Overestimation Bias Analysis and Stable Policy Optimization for Hybrid Action Space
Thanh-Tuan Tran, Thanh Nguyen Canh, Nak Young Chong, Xiem HoangVan

TL;DR
This paper introduces Hybrid TD3, a reinforcement learning algorithm designed for hybrid action spaces, providing theoretical analysis of overestimation bias and demonstrating improved stability and performance in robotic manipulation tasks.
Contribution
Hybrid TD3 extends TD3 to handle hybrid action spaces natively, with a novel bias reduction technique and comprehensive theoretical analysis of overestimation bias.
Findings
Hybrid TD3 reduces overestimation bias in hybrid action spaces.
The method achieves more stable training compared to existing approaches.
Experimental results show competitive performance in robotic manipulation tasks.
Abstract
Reinforcement learning in discrete-continuous hybrid action spaces presents fundamental challenges for robotic manipulation, where high-level task decisions and low-level joint-space execution must be jointly optimized. Existing approaches either discretize continuous components or relax discrete choices into continuous approximations, which suffer from scalability limitations and training instability in high-dimensional action spaces and under domain randomization. In this paper, we propose Hybrid TD3, an extension of Twin Delayed Deep Deterministic Policy Gradient (TD3) that natively handles parameterized hybrid action spaces in a principled manner. We conduct a rigorous theoretical analysis of overestimation bias in hybrid action settings, deriving formal bounds under twin-critic architectures and establishing a complete bias ordering across five algorithmic variants. Building on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
