RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Yukinari Hisaki; Isao Ono

arXiv:2408.01972·cs.LG·August 6, 2024

RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Yukinari Hisaki, Isao Ono

PDF

Open Access 1 Repo

TL;DR

This paper introduces RVI-SAC, an off-policy deep reinforcement learning algorithm that optimizes the average reward criterion, addressing discrepancies in continuing tasks and demonstrating competitive performance on Mujoco locomotion benchmarks.

Contribution

RVI-SAC extends Soft Actor-Critic to the average reward setting with novel critic and actor updates, enabling effective learning in continuing tasks.

Findings

01

RVI-SAC performs competitively on Mujoco locomotion tasks.

02

The method effectively incorporates average reward criterion into off-policy DRL.

03

Automatic adjustment of Reset Cost enhances applicability to termination tasks.

Abstract

In this paper, we propose an off-policy deep reinforcement learning (DRL) method utilizing the average reward criterion. While most existing DRL methods employ the discounted reward criterion, this can potentially lead to a discrepancy between the training objective and performance metrics in continuing tasks, making the average reward criterion a recommended alternative. We introduce RVI-SAC, an extension of the state-of-the-art off-policy DRL method, Soft Actor-Critic (SAC), to the average reward criterion. Our proposal consists of (1) Critic updates based on RVI Q-learning, (2) Actor updates introduced by the average reward soft policy improvement theorem, and (3) automatic adjustment of Reset Cost enabling the average reward reinforcement learning to be applied to tasks with termination. We apply our method to the Gymnasium's Mujoco tasks, a subset of locomotion tasks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhisaki/average-reward-drl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics