Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Hany Abdulsamad; Sahel Iqbal; Simo S\"arkk\"a

arXiv:2505.16732·cs.LG·December 5, 2025

Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Hany Abdulsamad, Sahel Iqbal, Simo S\"arkk\"a

PDF

Open Access

TL;DR

This paper introduces a novel policy optimization framework for continuous POMDPs using probabilistic inference and nested sequential Monte Carlo, effectively balancing exploration and exploitation in decision-making under uncertainty.

Contribution

It presents a new inference-based policy optimization method with a nested SMC algorithm for continuous POMDPs, avoiding suboptimal heuristics and improving decision-making under uncertainty.

Findings

01

Effective in standard continuous POMDP benchmarks

02

Outperforms existing methods in uncertain environments

03

Accurately estimates history-dependent policy gradients

Abstract

Optimal decision-making under partial observability requires agents to balance reducing uncertainty (exploration) against pursuing immediate objectives (exploitation). In this paper, we introduce a novel policy optimization framework for continuous partially observable Markov decision processes (POMDPs) that explicitly addresses this challenge. Our method casts policy learning as probabilistic inference in a non-Markovian Feynman--Kac model that inherently captures the value of information gathering by anticipating future observations, without requiring suboptimal approximations or handcrafted heuristics. To optimize policies under this model, we develop a nested sequential Monte Carlo (SMC) algorithm that efficiently estimates a history-dependent policy gradient under samples from the optimal trajectory distribution induced by the POMDP. We demonstrate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning