Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation

Edward Kim; Hanna Kurniawati

arXiv:2507.12186·cs.AI·July 17, 2025

Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation

Edward Kim, Hanna Kurniawati

PDF

Open Access

TL;DR

This paper introduces a new online approximate POMDP solver that samples deep future histories and guarantees bounded performance loss, outperforming existing benchmarks in complex, dynamic environments.

Contribution

It presents Partially Observable Reference Policy Programming, a novel algorithm with theoretical performance guarantees and superior empirical results on large-scale, dynamic POMDP problems.

Findings

01

Outperforms current online benchmarks in complex scenarios

02

Provides theoretical bounds on performance loss based on sampling errors

03

Successfully applied to large-scale, dynamic environments like helicopter emergency scenarios

Abstract

This paper proposes Partially Observable Reference Policy Programming, a novel anytime online approximate POMDP solver which samples meaningful future histories very deeply while simultaneously forcing a gradual policy update. We provide theoretical guarantees for the algorithm's underlying scheme which say that the performance loss is bounded by the average of the sampling approximation errors rather than the usual maximum, a crucial requirement given the sampling sparsity of online planning. Empirical evaluations on two large-scale problems with dynamically evolving environments -- including a helicopter emergency scenario in the Corsica region requiring approximately 150 planning steps -- corroborate the theoretical results and indicate that our solver considerably outperforms current online benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAntibiotics Pharmacokinetics and Efficacy · Machine Learning and Algorithms