Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application
Lucky Li

TL;DR
This paper applies reinforcement learning to a discrete-time linear-quadratic control problem, demonstrating Gaussian optimal policies, and extends the approach to a financial asset-liability management application with proven convergence.
Contribution
It introduces a reinforcement learning framework for discrete-time LQ control, proving Gaussian optimal policies and applying the method to financial management with convergence guarantees.
Findings
Optimal policies are Gaussian in the RL framework.
The RL algorithm converges and improves policies in the financial application.
Numerical simulations validate the theoretical results.
Abstract
We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL). Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type. Then, we apply the results of the discrete-time LQ model to solve the discrete-time mean-variance asset-liability management problem and prove our RL algorithm's policy improvement and convergence. Finally, a numerical example sheds light on the theoretical results established using simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control
