Black-Box Policy Search with Probabilistic Programs
Jan-Willem van de Meent, Brooks Paige, David Tolpin, Frank Wood

TL;DR
This paper demonstrates how probabilistic programs can serve as flexible, black-box policy representations in sequential decision problems, connecting policy gradient methods with variational inference, and showcasing practical case studies.
Contribution
It introduces a novel approach using probabilistic programs for policy representation, bridging policy gradient and variational inference techniques.
Findings
Probabilistic programs effectively model policies with moderate parameters.
The approach applies to diverse problems like Canadian traveler and Rock Sample.
Case studies show efficient policy representation and inference.
Abstract
In this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference
