Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise   Exploration-Exploitation Tradeoff

Jian Qian; Haichen Hu; David Simchi-Levi

arXiv:2405.17796·cs.LG·May 29, 2024

Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

Jian Qian, Haichen Hu, David Simchi-Levi

PDF

Open Access 1 Video

TL;DR

This paper presents a novel reduction from stochastic CMDPs to offline density estimation, enabling efficient, near-optimal algorithms with layerwise exploration-exploitation, applicable to reward-free RL and improving computational complexity.

Contribution

It introduces the first efficient reduction from CMDPs to offline density estimation without structural assumptions, with a layerwise exploration-exploitation strategy.

Findings

01

Achieves O(HlogT) calls to offline density estimation algorithms.

02

Reduces to O(HloglogT) when T is known in advance.

03

Applicable to reward-free reinforcement learning tasks.

Abstract

Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumption, i.e., a model class M containing the true underlying CMDP is provided in advance. We develop an efficient, statistically near-optimal algorithm requiring only O(HlogT) calls to an offline density estimation algorithm (or oracle) across all T rounds of interaction. This number can be further reduced to O(HloglogT) if T is known in advance. Our results mark the first efficient and near-optimal reduction from CMDPs to offline density estimation without imposing any structural assumptions on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff· slideslive

Taxonomy

TopicsInnovative Microfluidic and Catalytic Techniques Innovation · Data Stream Mining Techniques · Machine Learning and Algorithms