Scaling Up Robust MDPs by Reinforcement Learning

Aviv Tamar; Huan Xu; Shie Mannor

arXiv:1306.6189·cs.LG·June 27, 2013·28 cites

Scaling Up Robust MDPs by Reinforcement Learning

Aviv Tamar, Huan Xu, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based method to efficiently solve large-scale robust Markov decision processes, overcoming the computational limitations of traditional dynamic programming approaches.

Contribution

It presents the first scalable robust MDP solution using approximate dynamic programming with theoretical guarantees.

Findings

01

Method successfully solves large-scale robust MDPs in simulations.

02

Proven convergence under specific technical conditions.

03

Demonstrated effectiveness in an option pricing scenario.

Abstract

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. To the best of our knowledge, this is the first attempt to scale up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Evolutionary Algorithms and Applications