STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu

TL;DR
STOPS introduces a risk-averse policy search method that learns from short-term trajectories, avoiding hazardous states and achieving global optimality with convergence guarantees, demonstrated on Mujoco tasks.
Contribution
The paper proposes STOPS, a novel risk-averse policy search algorithm that uses short-term trajectories and guarantees global convergence with state-of-the-art performance.
Findings
Achieves global optimality at a sublinear rate.
Performs comparably to risk-neutral methods in convergence.
Demonstrates superior risk-averse performance on Mujoco tasks.
Abstract
It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
