Black-Box Policy Search with Probabilistic Programs

Jan-Willem van de Meent; Brooks Paige; David Tolpin; Frank Wood

arXiv:1507.04635·stat.ML·August 5, 2016·19 cites

Black-Box Policy Search with Probabilistic Programs

Jan-Willem van de Meent, Brooks Paige, David Tolpin, Frank Wood

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates how probabilistic programs can serve as flexible, black-box policy representations in sequential decision problems, connecting policy gradient methods with variational inference, and showcasing practical case studies.

Contribution

It introduces a novel approach using probabilistic programs for policy representation, bridging policy gradient and variational inference techniques.

Findings

01

Probabilistic programs effectively model policies with moderate parameters.

02

The approach applies to diverse problems like Canadian traveler and Rock Sample.

03

Case studies show efficient policy representation and inference.

Abstract

In this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/probprog/black-box-policy-search
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference