Predictive Lagrangian Optimization for Constrained Reinforcement Learning
Tianqi Zhang, Puzhen Yuan, Guojian Zhan, Ziyu Lin, Yao Lyu, Zhenzhi, Qin, Jingliang Duan, Liping Zhang, Shengbo Eben Li

TL;DR
This paper introduces a novel framework connecting constrained reinforcement learning with feedback control systems, leading to the development of the predictive Lagrangian optimization algorithm that outperforms traditional PID-based methods.
Contribution
It establishes a general equivalence framework between constrained RL and feedback control, and proposes the PLO algorithm using model predictive control for improved performance.
Findings
PLO achieves up to 7.2% larger feasible region.
PLO maintains comparable average reward to existing methods.
Framework unifies various feedback controllers for constrained RL.
Abstract
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Scheduling and Optimization Algorithms · Traffic control and management
