Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost of Ignoring Distribution Shift
Yanjun Chen, Xinming Zhang, Xianghui Wang, Zhiqiang Xu, Xiaoyu Shen,, Wei Zhang

TL;DR
This paper analyzes the distribution shift caused by the tanh transformation in Soft Actor-Critic algorithms, revealing its impact on performance in high-dimensional tasks and proposing improvements to address this issue.
Contribution
It provides a theoretical derivation of the action distribution's PDF after tanh transformation and demonstrates how accounting for this shift improves SAC's effectiveness.
Findings
Distribution shift distorts the Gaussian action distribution.
Correcting for the shift improves cumulative rewards.
Enhanced performance in high-dimensional control tasks.
Abstract
Soft Actor-Critic algorithm is widely recognized for its robust performance across a range of deep reinforcement learning tasks, where it leverages the tanh transformation to constrain actions within bounded limits. However, this transformation induces a distribution shift, distorting the original Gaussian action distribution and potentially leading the policy to select suboptimal actions, particularly in high-dimensional action spaces. In this paper, we conduct a comprehensive theoretical and empirical analysis of this distribution shift, deriving the precise probability density function (PDF) for actions following the tanh transformation to clarify the misalignment introduced between the transformed distribution's mode and the intended action output. We substantiate these theoretical insights through extensive experiments on high-dimensional tasks within the HumanoidBench benchmark.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl and Stability of Dynamical Systems
MethodsAverage Pooling · Dilated Convolution · Convolution · Global Average Pooling · 1x1 Convolution · Switchable Atrous Convolution
