Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs
Maris F. L. Galesloot, Roman Andriushchenko, Milan \v{C}e\v{s}ka, Sebastian Junges, Nils Jansen

TL;DR
This paper introduces a robust policy gradient method for Hidden-Model POMDPs that ensures performance across multiple environment models, improving robustness and scalability in uncertain decision-making scenarios.
Contribution
It combines formal verification with subgradient ascent to compute policies that are robust across a set of environment models in HM-POMDPs.
Findings
Policies are more robust and generalize better to unseen environments.
Method scales to HM-POMDPs with over a hundred thousand models.
Outperforms various baselines in empirical evaluations.
Abstract
Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs. We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP, and (2) subgradient ascent to optimize the candidate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification
MethodsSparse Evolutionary Training
