Oracle-Efficient Regret Minimization in Factored MDPs with Unknown   Structure

Aviv Rosenberg; Yishay Mansour

arXiv:2009.05986·cs.LG·October 12, 2021·5 cites

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

Aviv Rosenberg, Yishay Mansour

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an algorithm for regret minimization in factored MDPs that learns the structure during the process, improving upon prior methods that assumed known structure, and provides theoretical bounds.

Contribution

It presents the first algorithm capable of learning FMDP structure while minimizing regret, applicable even with limited oracle access, and establishes a new lower bound for known-structure cases.

Findings

01

Algorithm achieves regret minimization with structure learning.

02

Efficient implementation with oracle-access to FMDP planner.

03

Proves a new lower bound matching existing regret bounds.

Abstract

We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. Moreover, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we leverage our techniques to prove a novel lower bound for the known structure case, closing the gap to the regret bound of Chen et al. [2021].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avivros007/factored-mdp-with-unknown-structure
noneOfficial

Videos

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Machine Learning and Algorithms