MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage
Arash Bahari Kordabad, Wenqi Cai, Sebastien Gros

TL;DR
This paper introduces a homotopy-based MPC reinforcement learning approach for economic control problems, demonstrating improved learning efficiency in battery storage applications with nearly bang-bang policies.
Contribution
It proposes a novel homotopy strategy to enhance policy gradient methods for bang-bang structured policies in MPC-based reinforcement learning.
Findings
Faster convergence compared to classical policy gradient methods.
Effective handling of bang-bang policy structures.
Successful application to battery storage control problem.
Abstract
In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
