Data-Efficient Quadratic Q-Learning Using LMIs
J.S. van Hulst, W.P.M.H. Heemels, D.J. Antunes

TL;DR
This paper introduces two data-efficient off-policy reinforcement learning methods, LMI-QL and LMI-QLi, that leverage convex optimization and LMIs to improve learning speed in quadratic Q-learning.
Contribution
The paper presents novel LMI-based Q-learning algorithms that enhance data efficiency and convergence speed using convex relaxations and semidefinite programming.
Findings
LMI-QL and LMI-QLi outperform existing methods in numerical case studies.
The methods significantly reduce training data requirements.
Convex optimization techniques effectively improve Q-learning performance.
Abstract
Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy. A cost penalizing the -norm of Bellman errors is minimized. We propose two methods: Linear Matrix Inequality Q-Learning (LMI-QL) and its iterative variant (LMI-QLi), which solve the resulting episodic optimization problem through convex optimization. LMI-QL relies on a convex relaxation that yields a semidefinite programming (SDP) problem with linear matrix inequalities (LMIs). LMI-QLi entails solving sequential iterations of an SDP problem. Both methods combine convex optimization with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Machine Learning and ELM
MethodsBalanced Selection · Q-Learning
