Bellman Residual Minimization for Control: Geometry, Stationarity, and Convergence
Donghwan Lee, Hyukjun Yang

TL;DR
This paper investigates Bellman residual minimization for control in Markov decision problems, highlighting its theoretical foundations, advantages, and potential for stable convergence in policy optimization.
Contribution
It establishes foundational results for Bellman residual minimization applied to control, an area less explored compared to policy evaluation.
Findings
Bellman residual minimization offers stable convergence benefits.
The paper provides theoretical insights into control applications.
It discusses challenges and potential advantages over dynamic programming.
Abstract
Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
