Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani, Amirreza Kazemi, Reza Babanezhad, Nicolas Le Roux

TL;DR
This paper introduces a decision-aware actor-critic algorithm that jointly optimizes actor and critic with theoretical guarantees, improving policy performance and stability in reinforcement learning.
Contribution
It proposes a novel joint objective for actor-critic training that guarantees monotonic policy improvement and handles any function approximation.
Findings
The critic's decision-aware objective outperforms standard squared error in bandit examples.
The algorithm guarantees monotonic policy improvement under certain conditions.
Empirical results show improved performance on simple RL tasks.
Abstract
Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsTrust Region Policy Optimization
