Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost   of Ignoring Distribution Shift

Yanjun Chen; Xinming Zhang; Xianghui Wang; Zhiqiang Xu; Xiaoyu Shen,; Wei Zhang

arXiv:2410.16739·cs.LG·April 23, 2025

Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost of Ignoring Distribution Shift

Yanjun Chen, Xinming Zhang, Xianghui Wang, Zhiqiang Xu, Xiaoyu Shen,, Wei Zhang

PDF

Open Access

TL;DR

This paper analyzes the distribution shift caused by the tanh transformation in Soft Actor-Critic algorithms, revealing its impact on performance in high-dimensional tasks and proposing improvements to address this issue.

Contribution

It provides a theoretical derivation of the action distribution's PDF after tanh transformation and demonstrates how accounting for this shift improves SAC's effectiveness.

Findings

01

Distribution shift distorts the Gaussian action distribution.

02

Correcting for the shift improves cumulative rewards.

03

Enhanced performance in high-dimensional control tasks.

Abstract

Soft Actor-Critic algorithm is widely recognized for its robust performance across a range of deep reinforcement learning tasks, where it leverages the tanh transformation to constrain actions within bounded limits. However, this transformation induces a distribution shift, distorting the original Gaussian action distribution and potentially leading the policy to select suboptimal actions, particularly in high-dimensional action spaces. In this paper, we conduct a comprehensive theoretical and empirical analysis of this distribution shift, deriving the precise probability density function (PDF) for actions following the tanh transformation to clarify the misalignment introduced between the transformed distribution's mode and the intended action output. We substantiate these theoretical insights through extensive experiments on high-dimensional tasks within the HumanoidBench benchmark.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl and Stability of Dynamical Systems

MethodsAverage Pooling · Dilated Convolution · Convolution · Global Average Pooling · 1x1 Convolution · Switchable Atrous Convolution