Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine
Kwangyeon Kim, Akshita Gupta, Hong-Cheol Choi, Inseok Hwang

TL;DR
This paper introduces a model-free reinforcement learning algorithm that guarantees satisfaction of hard state constraints using a multi-class SVM, ensuring safety and optimality in discrete systems.
Contribution
It presents a novel SVM-based policy optimization method that guarantees constraint satisfaction and convergence to the optimal policy in a model-free RL setting.
Findings
Guarantees satisfaction of hard state constraints.
Ensures convergence to the optimal policy.
Demonstrated effectiveness on multiple examples.
Abstract
Several works have addressed the problem of incorporating constraints in the reinforcement learning (RL) framework, however majority of them can only guarantee the satisfaction of soft constraints. In this work, we address the problem of satisfying hard state constraints in a model-free RL setting with the deterministic system dynamics. The proposed algorithm is developed for the discrete state and action space and utilizes a multi-class support vector machine (SVM) to represent the policy. The state constraints are incorporated in the SVM optimization framework to derive an analytical solution for determining the policy parameters. This final policy converges to a solution which is guaranteed to satisfy the constraints. Additionally, the proposed formulation adheres to the Q-learning framework and thus, also guarantees convergence to the optimal solution. The algorithm is demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Elevator Systems and Control
MethodsSupport Vector Machine · Q-Learning
