TL;DR
This paper introduces a PAC-Bayes framework for learning control policies that provably generalize to new environments, with algorithms optimized via convex programming or stochastic gradient descent, demonstrated on robotic tasks.
Contribution
It presents a novel PAC-Bayes-based approach for control policy generalization, including algorithms for finite and continuous policy spaces, with theoretical guarantees and practical robotic applications.
Findings
Successful simulation of obstacle avoidance and grasping policies.
Hardware validation on a drone navigating through obstacles.
Strong generalization guarantees for neural network policies in robotics.
Abstract
Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
