Model-Based Actor-Critic with Chance Constraint for Stochastic System

Baiyu Peng; Yao Mu; Yang Guan; Shengbo Eben Li; Yuming Yin; Jianyu; Chen

arXiv:2012.10716·cs.LG·March 17, 2021

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Baiyu Peng, Yao Mu, Yang Guan, Shengbo Eben Li, Yuming Yin, Jianyu, Chen

PDF

Open Access

TL;DR

This paper introduces a model-based actor-critic algorithm that efficiently learns safe policies in stochastic systems by directly solving chance constraints, demonstrating faster convergence and higher efficiency than existing methods.

Contribution

The proposed CCAC algorithm directly addresses chance constraints in RL, improving convergence speed and safety performance over prior conservative or slow methods.

Findings

01

CCAC achieves five times faster convergence than previous RL methods.

02

CCAC maintains safety while improving performance in stochastic car-following tasks.

03

CCAC is 100 times more computationally efficient than traditional safety techniques.

Abstract

Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low convergence rate, or only learn a conservative policy. In this paper, we propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy. Different from existing methods that optimize a conservative lower bound, CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights. In order to improve the convergence rate, CCAC utilizes the gradient of dynamic model to accelerate policy optimization. The effectiveness of CCAC is demonstrated by a stochastic car-following task. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Simulation Techniques and Applications

MethodsConfidence Calibration with an Auxiliary Class)