Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

Shangzhe Li; Weitong Zhang

arXiv:2604.13966·cs.LG·April 16, 2026

Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

Shangzhe Li, Weitong Zhang

PDF

TL;DR

This paper introduces a new algorithm for offline-to-online reinforcement learning that adapts pretrained value functions efficiently under certain conditions, supported by theoretical analysis and neural network experiments.

Contribution

It establishes a minimax lower bound for the problem and proposes O2O-LSVI, a novel method with problem-dependent sample complexity for effective value adaptation.

Findings

01

Minimax lower bound shows inherent difficulty in offline-to-online RL.

02

O2O-LSVI algorithm improves sample efficiency under structural conditions.

03

Neural network experiments demonstrate practical effectiveness of the proposed method.

Abstract

We study value adaptation in offline-to-online reinforcement learning under general function approximation. Starting from an imperfect offline pretrained $Q$ -function, the learner aims to adapt it to the target environment using only a limited amount of online interaction. We first characterize the difficulty of this setting by establishing a minimax lower bound, showing that even when the pretrained $Q$ -function is close to optimal $Q^{⋆}$ , online adaptation can be no more efficient than pure online RL on certain hard instances. On the positive side, under a novel structural condition on the offline-pretrained value functions, we propose O2O-LSVI, an adaptation algorithm with problem-dependent sample complexity that provably improves over pure online RL. Finally, we complement our theory with neural-network experiments that demonstrate the practical effectiveness of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.