Decision-Aware Actor-Critic with Function Approximation and Theoretical   Guarantees

Sharan Vaswani; Amirreza Kazemi; Reza Babanezhad; Nicolas Le Roux

arXiv:2305.15249·cs.LG·November 1, 2023·1 cites

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Sharan Vaswani, Amirreza Kazemi, Reza Babanezhad, Nicolas Le Roux

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a decision-aware actor-critic algorithm that jointly optimizes actor and critic with theoretical guarantees, improving policy performance and stability in reinforcement learning.

Contribution

It proposes a novel joint objective for actor-critic training that guarantees monotonic policy improvement and handles any function approximation.

Findings

01

The critic's decision-aware objective outperforms standard squared error in bandit examples.

02

The algorithm guarantees monotonic policy improvement under certain conditions.

03

Empirical results show improved performance on simple RL tasks.

Abstract

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirrezakazemi/acpg
noneOfficial

Videos

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTrust Region Policy Optimization