Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds   Globally Optimal Policy

Han Zhong; Xun Deng; Ethan X. Fang; Zhuoran Yang; Zhaoran Wang; Runze; Li

arXiv:2012.14098·cs.LG·September 19, 2023·5 cites

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

Han Zhong, Xun Deng, Ethan X. Fang, Zhuoran Yang, Zhaoran Wang, Runze, Li

PDF

Open Access

TL;DR

This paper introduces a risk-sensitive deep reinforcement learning method that optimizes policies under variance constraints, providing theoretical guarantees of global optimality and demonstrating effectiveness on real datasets.

Contribution

It develops a novel actor-critic algorithm for variance-constrained policy optimization with provable convergence to a globally optimal policy.

Findings

01

The algorithm converges to a globally optimal policy at a sublinear rate.

02

The method effectively manages risk by constraining variance in long-term rewards.

03

Numerical studies validate theoretical results on real datasets.

Abstract

While deep reinforcement learning has achieved tremendous successes in various applications, most existing works only focus on maximizing the expected value of total return and thus ignore its inherent stochasticity. Such stochasticity is also known as the aleatoric uncertainty and is closely related to the notion of risk. In this work, we make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold. Utilizing Lagrangian and Fenchel dualities, we transform the original problem into an unconstrained saddle-point policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Energy, Environment, and Transportation Policies