To bootstrap or to rollout? An optimal and adaptive interpolation
Wenlong Mou, Jian Qian

TL;DR
This paper introduces subgraph Bellman operators that adaptively interpolate between bootstrapping and rollout methods in reinforcement learning, achieving optimal variance and sample complexity in policy evaluation.
Contribution
The paper proposes a new class of Bellman operators that unify and optimize bootstrapping and rollout approaches, with theoretical guarantees on variance and sample complexity.
Findings
Estimator's error approaches TD's optimal variance
Sample complexity depends only on subset occupancy measure
Lower bound shows the additional variance term is unavoidable
Abstract
Bootstrapping and rollout are two fundamental principles for value function estimation in reinforcement learning (RL). We introduce a novel class of Bellman operators, called subgraph Bellman operators, that interpolate between bootstrapping and rollout methods. Our estimator, derived by solving the fixed point of the empirical subgraph Bellman operator, combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods. Specifically, the error upper bound of our estimator approaches the optimal variance achieved by TD, with an additional term depending on the exit probability of a selected subset of the state space. At the same time, the estimator exhibits the finite-sample adaptivity of MC, with sample complexity depending only on the occupancy measure of this subset. We complement the upper bound with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
