Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes
Gilwoo Lee, Sanjiban Choudhury, Brian Hou, Siddhartha S. Srinivasa

TL;DR
This paper introduces Bayes-CPACE, the first PAC optimal algorithm for continuous-space BAMDPs, using sampling and Lipschitz continuity to efficiently approximate optimal policies under model uncertainty.
Contribution
It presents a novel PAC optimal algorithm for continuous BAMDPs that leverages sampling and Lipschitz properties to handle intractability.
Findings
Algorithm is proven to be near-optimal.
Empirical results show competitive performance.
Efficient schemes improve computational feasibility.
Abstract
We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm's efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · AI-based Problem Solving and Planning
