Variance Adjusted Actor Critic Algorithms
Aviv Tamar, Shie Mannor

TL;DR
This paper introduces a novel actor-critic framework for Markov Decision Processes that optimizes a variance-adjusted expected return, with proven convergence to local optima using linear function approximation.
Contribution
It extends compatible features to variance-adjusted objectives and provides an episodic actor-critic algorithm with convergence guarantees.
Findings
Converges almost surely to local optima
Extends compatible features to variance-adjusted setting
Demonstrates effectiveness of the proposed algorithm
Abstract
We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Advanced Bandit Algorithms Research
