Average-Reward Soft Actor-Critic

Jacob Adamczyk; Volodymyr Makarenko; Stas Tiomkin; Rahul V. Kulkarni

arXiv:2501.09080·cs.LG·August 6, 2025

Average-Reward Soft Actor-Critic

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

PDF

Open Access

TL;DR

This paper introduces an average-reward soft actor-critic algorithm that extends entropy-regularized reinforcement learning to the average-reward setting, demonstrating improved performance on standard benchmarks.

Contribution

It develops the first deep RL actor-critic method with entropy regularization for the average-reward criterion, filling a key gap in the literature.

Findings

01

Outperforms existing average-reward algorithms on benchmarks

02

Validates the effectiveness of entropy regularization in average-reward RL

03

Provides a new framework for stable average-reward policy learning

Abstract

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting, algorithms with entropy regularization have been developed, leading to improvements over deterministic methods. Despite the distinct benefits of these approaches, deep RL algorithms for the entropy-regularized average-reward objective have not been developed. While policy-gradient based approaches have recently been presented for the average-reward literature, the corresponding actor-critic framework remains less explored. In this paper, we introduce an average-reward soft actor-critic algorithm to address these gaps in the field. We validate our method by comparing with existing average-reward algorithms on standard RL benchmarks, achieving superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control

MethodsEntropy Regularization