Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order   Bounds

Zhiyong Wang; Dongruo Zhou; John C.S. Lui; Wen Sun

arXiv:2408.08994·cs.LG·October 30, 2024

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Zhiyong Wang, Dongruo Zhou, John C.S. Lui, Wen Sun

PDF

Open Access

TL;DR

This paper demonstrates that simple model-based RL algorithms using MLE and optimistic/pessimistic planning can achieve nearly horizon-free and second-order regret bounds, with broad applicability and straightforward analysis.

Contribution

It shows that standard MLE-based model learning combined with optimistic/pessimistic planning attains strong theoretical guarantees without complex algorithmic modifications.

Findings

01

Achieves nearly horizon-free regret bounds.

02

Attains second-order, instance-dependent bounds.

03

Applicable to both online and offline RL settings.

Abstract

Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning inside the learned model is perhaps the most standard and simplest Model-based Reinforcement Learning (RL) framework. In this work, we show that such a simple Model-based RL scheme, when equipped with optimistic and pessimistic planning procedures, achieves strong regret and sample complexity bounds in online and offline RL settings. Particularly, we demonstrate that under the conditions where the trajectory-wise reward is normalized between zero and one and the transition is time-homogenous, it achieves nearly horizon-free and second-order bounds. Nearly horizon-free means that our bounds have no polynomial dependence on the horizon of the Markov Decision Process. A second-order bound is a type of instance-dependent bound that scales with respect to the variances of the returns of the policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModeling and Simulation Systems