# Distributionally Robust Partially Observable Markov Decision Process   with Moment-based Ambiguity

**Authors:** Hideaki Nakao, Ruiwei Jiang, Siqian Shen

arXiv: 1906.05988 · 2020-12-09

## TL;DR

This paper introduces a distributionally robust POMDP framework using moment-based ambiguity sets, providing a method to derive robust policies that outperform traditional POMDPs under distribution misspecification.

## Contribution

It develops a novel DR-POMDP model with moment-based ambiguity sets and proposes a heuristic value iteration method for policy computation.

## Key findings

- DR-POMDP yields more robust policies than POMDP under distribution misspecification.
- The approach is computationally efficient and insensitive to parameter variations.
- Numerical tests on epidemic control demonstrate practical effectiveness.

## Abstract

We consider a distributionally robust Partially Observable Markov Decision Process (DR-POMDP), where the distribution of the transition-observation probabilities is unknown at the beginning of each decision period, but their realizations can be inferred using side information at the end of each period after an action being taken. We build an ambiguity set of the joint distribution using bounded moments via conic constraints and seek an optimal policy to maximize the worst-case (minimum) reward for any distribution in the set. We show that the value function of DR-POMDP is piecewise linear convex with respect to the belief state and propose a heuristic search value iteration method for obtaining lower and upper bounds of the value function. We conduct numerical studies and demonstrate the computational performance of our approach via testing instances of a dynamic epidemic control problem. Our results show that DR-POMDP can produce more robust policies under misspecified distributions of transition-observation probabilities as compared to POMDP, but has less costly solutions than robust POMDP. The DR-POMDP policies are also insensitive to varying parameter in the ambiguity set and to noise added to the true transition-observation probability values obtained at the end of each decision period.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.05988/full.md

## Figures

35 figures with captions in the complete paper: https://tomesphere.com/paper/1906.05988/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1906.05988/full.md

---
Source: https://tomesphere.com/paper/1906.05988