Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
Xingang Guo, Bin Hu

TL;DR
This paper introduces a unified control-theoretic framework for analyzing value-based reinforcement learning methods using convex programs and Lyapunov functions, revealing deep connections with control systems.
Contribution
It presents a novel approach that leverages convex control theory to analyze and derive convergence results for value-based RL algorithms, bridging RL and control theory.
Findings
Convex testing conditions can be used to analyze RL algorithms.
Lyapunov functions can be constructed via convex programs.
Connections between feedback control and RL algorithms are established.
Abstract
Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life
