Bellman Error Centering
Xingguo Chen, Yu Gong, Shangdong Yang, Wenhao Wang

TL;DR
This paper clarifies the concept of Bellman error centering in reinforcement learning, introduces new algorithms based on it, and proves their convergence and stability.
Contribution
It establishes Bellman error centering as a unifying framework, develops new on-policy and off-policy algorithms, and provides convergence proofs and empirical validation.
Findings
Centered fixpoint for tabular value functions derived.
Convergence of on-policy CTD and off-policy CTDC algorithms proved.
Experimental results confirm stability of proposed algorithms.
Abstract
This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
