To bootstrap or to rollout? An optimal and adaptive interpolation

Wenlong Mou; Jian Qian

arXiv:2411.09731·cs.LG·December 2, 2024

To bootstrap or to rollout? An optimal and adaptive interpolation

Wenlong Mou, Jian Qian

PDF

Open Access

TL;DR

This paper introduces subgraph Bellman operators that adaptively interpolate between bootstrapping and rollout methods in reinforcement learning, achieving optimal variance and sample complexity in policy evaluation.

Contribution

The paper proposes a new class of Bellman operators that unify and optimize bootstrapping and rollout approaches, with theoretical guarantees on variance and sample complexity.

Findings

01

Estimator's error approaches TD's optimal variance

02

Sample complexity depends only on subset occupancy measure

03

Lower bound shows the additional variance term is unavoidable

Abstract

Bootstrapping and rollout are two fundamental principles for value function estimation in reinforcement learning (RL). We introduce a novel class of Bellman operators, called subgraph Bellman operators, that interpolate between bootstrapping and rollout methods. Our estimator, derived by solving the fixed point of the empirical subgraph Bellman operator, combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods. Specifically, the error upper bound of our estimator approaches the optimal variance achieved by TD, with an additional term depending on the exit probability of a selected subset of the state space. At the same time, the estimator exhibits the finite-sample adaptivity of MC, with sample complexity depending only on the occupancy measure of this subset. We complement the upper bound with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications