Variational Actor-Critic Algorithms

Yuhua Zhu; Lexing Ying

arXiv:2108.01215·cs.LG·January 18, 2023

Variational Actor-Critic Algorithms

Yuhua Zhu, Lexing Ying

PDF

TL;DR

This paper presents a new class of variational actor-critic algorithms that optimize both value functions and policies simultaneously, introducing variants to enhance convergence speed and analyzing their fixed points relative to optimal policies.

Contribution

It introduces variational formulations for actor-critic algorithms, along with two novel variants, and provides theoretical analysis of their convergence properties.

Findings

01

Variants accelerate convergence

02

Fixed points are near optimal policies under certain conditions

03

The approach unifies value and policy optimization in a variational framework

Abstract

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clipping method and the flipping method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.