Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Maris F. L. Galesloot; Roman Andriushchenko; Milan \v{C}e\v{s}ka; Sebastian Junges; Nils Jansen

arXiv:2505.09518·cs.AI·August 21, 2025

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Maris F. L. Galesloot, Roman Andriushchenko, Milan \v{C}e\v{s}ka, Sebastian Junges, Nils Jansen

PDF

Open Access

TL;DR

This paper introduces a robust policy gradient method for Hidden-Model POMDPs that ensures performance across multiple environment models, improving robustness and scalability in uncertain decision-making scenarios.

Contribution

It combines formal verification with subgradient ascent to compute policies that are robust across a set of environment models in HM-POMDPs.

Findings

01

Policies are more robust and generalize better to unseen environments.

02

Method scales to HM-POMDPs with over a hundred thousand models.

03

Outperforms various baselines in empirical evaluations.

Abstract

Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs. We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP, and (2) subgradient ascent to optimize the candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification

MethodsSparse Evolutionary Training