FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

Jun Xue; Junze Wang; Shanze Wang; Xinming Zhang; Yanjun Chen; Wei Zhang

arXiv:2603.12612·cs.LG·May 5, 2026

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

Jun Xue, Junze Wang, Shanze Wang, Xinming Zhang, Yanjun Chen, Wei Zhang

PDF

TL;DR

FastDSAC enhances maximum entropy RL for high-dimensional humanoid control by introducing dynamic exploration and a specialized critic, achieving state-of-the-art results and outperforming deterministic methods on complex tasks.

Contribution

It presents FastDSAC, a novel framework with Dimension-wise Entropy Modulation and a tailored critic to improve exploration and value estimation in high-dimensional stochastic RL.

Findings

01

FastDSAC achieves state-of-the-art performance on HumanoidBench.

02

It outperforms deterministic baselines by 180% on Basketball and 350% on Balance Hard.

03

The method demonstrates significant improvements in exploration efficiency and training stability.

Abstract

Scaling Maximum Entropy Reinforcement Learning (RL) to high-dimensional humanoid control remains a fundamental challenge, as the ''curse of dimensionality'' induces severe exploration inefficiency and training instability. Consequently, highly optimized deterministic policy gradients currently dominate high-throughput regimes. We address this limitation with FastDSAC, a framework that effectively unlocks the potential of maximum entropy stochastic policies for complex continuous control. We introduce Dimension-wise Entropy Modulation (DEM) to dynamically redistribute the exploration budget, alongside a continuous distributional critic tailored to ensure accurate value estimation by mitigating both high-dimensional overestimation and discrete quantization artifacts. Extensive evaluations on HumanoidBench and a diverse set of continuous control tasks demonstrate that FastDSAC establishes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.