FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
Jun Xue, Junze Wang, Shanze Wang, Xinming Zhang, Yanjun Chen, Wei Zhang

TL;DR
FastDSAC enhances maximum entropy RL for high-dimensional humanoid control by introducing dynamic exploration and a specialized critic, achieving state-of-the-art results and outperforming deterministic methods on complex tasks.
Contribution
It presents FastDSAC, a novel framework with Dimension-wise Entropy Modulation and a tailored critic to improve exploration and value estimation in high-dimensional stochastic RL.
Findings
FastDSAC achieves state-of-the-art performance on HumanoidBench.
It outperforms deterministic baselines by 180% on Basketball and 350% on Balance Hard.
The method demonstrates significant improvements in exploration efficiency and training stability.
Abstract
Scaling Maximum Entropy Reinforcement Learning (RL) to high-dimensional humanoid control remains a fundamental challenge, as the ''curse of dimensionality'' induces severe exploration inefficiency and training instability. Consequently, highly optimized deterministic policy gradients currently dominate high-throughput regimes. We address this limitation with FastDSAC, a framework that effectively unlocks the potential of maximum entropy stochastic policies for complex continuous control. We introduce Dimension-wise Entropy Modulation (DEM) to dynamically redistribute the exploration budget, alongside a continuous distributional critic tailored to ensure accurate value estimation by mitigating both high-dimensional overestimation and discrete quantization artifacts. Extensive evaluations on HumanoidBench and a diverse set of continuous control tasks demonstrate that FastDSAC establishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
