TL;DR
GoSafeOpt is a novel algorithm that enables safe, scalable, and global optimization of control policies for high-dimensional dynamical systems, overcoming limitations of previous methods.
Contribution
It introduces GoSafeOpt, the first scalable safe exploration algorithm capable of globally optimizing policies in high-dimensional systems with safety guarantees.
Findings
Successfully applied to a robot arm system
Outperforms existing safe learning methods
Ensures safety during exploration
Abstract
Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
