STOPS: Short-Term-based Volatility-controlled Policy Search and its   Global Convergence

Liangliang Xu; Daoming Lyu; Yangchen Pan; Aiwen Jiang; Bo Liu

arXiv:2201.09857·cs.LG·July 25, 2022

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu

PDF

Open Access

TL;DR

STOPS introduces a risk-averse policy search method that learns from short-term trajectories, avoiding hazardous states and achieving global optimality with convergence guarantees, demonstrated on Mujoco tasks.

Contribution

The paper proposes STOPS, a novel risk-averse policy search algorithm that uses short-term trajectories and guarantees global convergence with state-of-the-art performance.

Findings

01

Achieves global optimality at a sublinear rate.

02

Performs comparably to risk-neutral methods in convergence.

03

Demonstrates superior risk-averse performance on Mujoco tasks.

Abstract

It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)