Polynomial Time Reinforcement Learning in Factored State MDPs with   Linear Value Functions

Zihao Deng; Siddartha Devic; Brendan Juba

arXiv:2107.05187·cs.LG·March 8, 2022

Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Zihao Deng, Siddartha Devic, Brendan Juba

PDF

Open Access

TL;DR

This paper introduces a polynomial-time reinforcement learning algorithm for factored state MDPs that does not depend on an oracle planner or linear transition models, broadening applicability to more complex environments.

Contribution

It presents the first polynomial-time RL algorithm for Factored State MDPs that only requires a linear value function with a local basis, without assuming transition independence.

Findings

01

Achieves polynomial-time RL in factored state spaces

02

Does not rely on oracle planners or linear transition models

03

Handles dependent factor transitions

Abstract

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics