Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step   Q-learning: A Novel Correction Approach

Baturay Saglam; Dogan C. Cicek; Furkan B. Mutlu; Suleyman S. Kozat

arXiv:2208.00755·cs.LG·September 27, 2023

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new one-step correction method for off-policy actor-critic algorithms that reduces bias and improves data efficiency in continuous control tasks, especially with deterministic policies.

Contribution

It proposes a novel policy similarity measure for single-step off-policy correction applicable to deterministic neural policies, addressing limitations of existing importance sampling techniques.

Findings

01

Achieves higher returns with fewer steps compared to existing methods.

02

Demonstrates theoretical guarantees for safe off-policy learning.

03

Improves performance in continuous control benchmarks.

Abstract

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy between the underlying distributions of the agent's policy and collected data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences, which eventually increases the computational complexity. Moreover, their generalization to either continuous action domains or policies approximated by deterministic deep neural networks is strictly limited. To overcome these limitations, we introduce a novel policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baturaysaglam/ac-off-poc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning