Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Haichen Hu; Jian Qian; David Simchi-Levi

arXiv:2605.00393·cs.LG·May 4, 2026

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Haichen Hu, Jian Qian, David Simchi-Levi

PDF

TL;DR

This paper introduces a novel offline oracle-efficient reinforcement learning algorithm that achieves optimal regret bounds with oracle complexity independent of state and action space sizes, applicable to large and infinite environments.

Contribution

It presents the first doubly oracle-efficient RL algorithm with oracle complexity independent of environment size, extending to infinite state and action spaces.

Findings

01

Achieves $ ilde{O}( oot{T} ull)$ regret with minimal oracle calls.

02

Oracle complexity is independent of state and action space sizes.

03

Generalizes to linear MDPs with infinite state and action spaces.

Abstract

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments. In this paper, we address this fundamental limitation by studying offline oracle-efficient episodic RL through the lens of log-barrier and log-determinant regularization. Specifically, for tabular Markov Decision Processes (MDPs), we propose a novel algorithm that achieves the optimal $\tilde{O} (T)$ regret bound while requiring only $O (H lo g lo g T)$ calls to both the offline statistical estimation and planning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.