Bellman Error Centering

Xingguo Chen; Yu Gong; Shangdong Yang; Wenhao Wang

arXiv:2502.03104·cs.LG·February 6, 2025

Bellman Error Centering

Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang

PDF

Open Access

TL;DR

This paper clarifies the concept of Bellman error centering in reinforcement learning, introduces new algorithms based on it, and proves their convergence and stability.

Contribution

It establishes Bellman error centering as a unifying framework, develops new on-policy and off-policy algorithms, and provides convergence proofs and empirical validation.

Findings

01

Centered fixpoint for tabular value functions derived.

02

Convergence of on-policy CTD and off-policy CTDC algorithms proved.

03

Experimental results confirm stability of proposed algorithms.

Abstract

This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications