Robust Parameter Learning for Uncertain MDPs

Yannik Schnitzer; Alessandro Abate; David Parker

arXiv:2605.01339·cs.LG·May 5, 2026

Robust Parameter Learning for Uncertain MDPs

Yannik Schnitzer, Alessandro Abate, David Parker

PDF

TL;DR

This paper introduces a method for learning uncertain MDPs using parametric models that capture dependencies between transitions, resulting in tighter uncertainty estimates for robust policy synthesis.

Contribution

It proposes a novel approach to model uncertainty in MDPs with parametric models, projecting statistical uncertainty onto parameter space for PAC guarantees.

Findings

01

Tighter uncertainty bounds than classical interval methods.

02

Hierarchical polytopic outer approximations for solving the models.

03

Empirical evaluation demonstrating improved uncertainty estimation.

Abstract

Learning-based approaches to verifying unknown Markov decision processes (MDPs) often employ uncertain MDPs. These models use, for example, confidence intervals to capture transition uncertainty and allow synthesis of policies that are robust to this uncertainty. However, this approach typically quantifies uncertainty independently for individual transition probabilities, ignoring dependencies due to shared latent quantities. We propose to learn such models using parametric MDPs (pMDPs), where transition probabilities are expressions over a set of parameters. We project statistical uncertainty from empirical transition frequencies onto the pMDP's parameter space, yielding a probably approximately correct (PAC) uncertainty model for the underlying MDP that respects the algebraic dependencies between transitions. The resulting models are algorithmically challenging to solve, so we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.