Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain; Gandharv Patil; Ayush Jain; Khimya Khetarpal; Doina; Precup

arXiv:2102.01985·cs.LG·February 4, 2021

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain, Khimya Khetarpal, Doina, Precup

PDF

Open Access 1 Repo

TL;DR

This paper introduces variance-penalized actor-critic algorithms that optimize both mean and variance of returns, ensuring more reliable policies with lower return variability in reinforcement learning tasks.

Contribution

It proposes a novel direct variance estimator and demonstrates convergence to locally optimal policies in finite MDPs, improving reliability in reinforcement learning.

Findings

01

Algorithms achieve lower return variance while maintaining expected return.

02

Converge to locally optimal policies in finite MDPs.

03

Effective in both tabular and continuous MuJoCo domains.

Abstract

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this paper, we propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return. Previous work uses the second moment of return to estimate the variance indirectly. Instead, we use a much simpler recently proposed direct variance estimator which updates the estimates incrementally using temporal difference methods. Using the variance-penalized criterion, we guarantee the convergence of our algorithm to locally optimal policies for finite state action Markov decision processes. We demonstrate the utility of our algorithm in tabular and continuous MuJoCo domains. Our approach not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arushi12130/VariancePenalizedActorCritic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)