TL;DR
This paper extends reinforcement learning to solve reach-avoid optimal control problems, providing convergence guarantees and enabling safe autonomous systems through a novel Bellman backup and deep RL methods.
Contribution
It generalizes RL to all reach-avoid problems, deriving a new Bellman backup, proving convergence, and demonstrating zero-violation guarantees in complex systems.
Findings
Proposed a reach-avoid Bellman backup with contraction properties.
Proved convergence of reach-avoid Q-learning under certain conditions.
Validated the approach on nonlinear systems with intractable solutions.
Abstract
Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
