Residuals-based Offline Reinforcement Learning

Qing Zhu; Xian Yu

arXiv:2604.01378·cs.LG·April 3, 2026

Residuals-based Offline Reinforcement Learning

Qing Zhu, Xian Yu

PDF

TL;DR

This paper introduces a residuals-based framework for offline reinforcement learning that explicitly models estimation errors, providing theoretical guarantees and demonstrating effectiveness in a CartPole environment.

Contribution

It proposes a novel residuals-based Bellman operator and develops an offline deep Q-learning algorithm with finite-sample guarantees.

Findings

01

The residuals-based Bellman operator is a contraction mapping.

02

The fixed point of the operator is asymptotically optimal under certain conditions.

03

The residuals-based offline DQN performs effectively in a stochastic CartPole environment.

Abstract

Offline reinforcement learning (RL) has received increasing attention for learning policies from previously collected data without interaction with the real environment, which is particularly important in high-stakes applications. While a growing body of work has developed offline RL algorithms, these methods often rely on restrictive assumptions about data coverage and suffer from distribution shift. In this paper, we propose a residuals-based offline RL framework for general state and action spaces. Specifically, we define a residuals-based Bellman optimality operator that explicitly incorporates estimation error in learning transition dynamics into policy optimization by leveraging empirical residuals. We show that this Bellman operator is a contraction mapping and identify conditions under which its fixed point is asymptotically optimal and possesses finite-sample guarantees. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.