Model-Based Actor-Critic with Chance Constraint for Stochastic System
Baiyu Peng, Yao Mu, Yang Guan, Shengbo Eben Li, Yuming Yin, Jianyu, Chen

TL;DR
This paper introduces a model-based actor-critic algorithm that efficiently learns safe policies in stochastic systems by directly solving chance constraints, demonstrating faster convergence and higher efficiency than existing methods.
Contribution
The proposed CCAC algorithm directly addresses chance constraints in RL, improving convergence speed and safety performance over prior conservative or slow methods.
Findings
CCAC achieves five times faster convergence than previous RL methods.
CCAC maintains safety while improving performance in stochastic car-following tasks.
CCAC is 100 times more computationally efficient than traditional safety techniques.
Abstract
Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low convergence rate, or only learn a conservative policy. In this paper, we propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy. Different from existing methods that optimize a conservative lower bound, CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights. In order to improve the convergence rate, CCAC utilizes the gradient of dynamic model to accelerate policy optimization. The effectiveness of CCAC is demonstrated by a stochastic car-following task. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Simulation Techniques and Applications
MethodsConfidence Calibration with an Auxiliary Class)
