Residuals-based Offline Reinforcement Learning
Qing Zhu, Xian Yu

TL;DR
This paper introduces a residuals-based framework for offline reinforcement learning that explicitly models estimation errors, providing theoretical guarantees and demonstrating effectiveness in a CartPole environment.
Contribution
It proposes a novel residuals-based Bellman operator and develops an offline deep Q-learning algorithm with finite-sample guarantees.
Findings
The residuals-based Bellman operator is a contraction mapping.
The fixed point of the operator is asymptotically optimal under certain conditions.
The residuals-based offline DQN performs effectively in a stochastic CartPole environment.
Abstract
Offline reinforcement learning (RL) has received increasing attention for learning policies from previously collected data without interaction with the real environment, which is particularly important in high-stakes applications. While a growing body of work has developed offline RL algorithms, these methods often rely on restrictive assumptions about data coverage and suffer from distribution shift. In this paper, we propose a residuals-based offline RL framework for general state and action spaces. Specifically, we define a residuals-based Bellman optimality operator that explicitly incorporates estimation error in learning transition dynamics into policy optimization by leveraging empirical residuals. We show that this Bellman operator is a contraction mapping and identify conditions under which its fixed point is asymptotically optimal and possesses finite-sample guarantees. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
