SACn: Soft Actor-Critic with n-step Returns

Jakub {\L}yskawa; Jakub Lewandowski; Pawe{\l} Wawrzy\'nski

arXiv:2512.13165·cs.LG·December 16, 2025

SACn: Soft Actor-Critic with n-step Returns

Jakub {\L}yskawa, Jakub Lewandowski, Pawe{\l} Wawrzy\'nski

PDF

Open Access

TL;DR

This paper introduces SACn, an improved version of Soft Actor-Critic that effectively incorporates n-step returns using numerically stable importance sampling, leading to faster convergence in reinforcement learning tasks.

Contribution

The paper presents a novel method to combine SAC with n-step returns using stable importance sampling and entropy estimation techniques, addressing previous stability issues.

Findings

01

SACn accelerates convergence in MuJoCo environments.

02

The importance sampling method improves stability in n-step SAC.

03

Entropy estimation reduces variance in learning targets.

Abstract

Soft Actor-Critic (SAC) is widely used in practical applications and is now one of the most relevant off-policy online model-free reinforcement learning (RL) methods. The technique of n-step returns is known to increase the convergence speed of RL algorithms compared to their 1-step returns-based versions. However, SAC is notoriously difficult to combine with n-step returns, since their usual combination introduces bias in off-policy algorithms due to the changes in action distribution. While this problem is solved by importance sampling, a method for estimating expected values of one distribution using samples from another distribution, importance sampling may result in numerical instability. In this work, we combine SAC with n-step returns in a way that overcomes this issue. We present an approach to applying numerically stable importance sampling with simplified hyperparameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research