Bridging the gap between QP-based and MPC-based RL
Shambhuraj Sawant, Sebastien Gros

TL;DR
This paper introduces a method that combines the flexibility of QP-based reinforcement learning with the interpretability of MPC schemes, enabling a trade-off between learning capacity and explainability.
Contribution
It proposes tools to structure QPs to resemble MPC schemes, enhancing policy explainability while maintaining learning flexibility.
Findings
Structured QPs improve policy interpretability.
Trade-off tools allow balancing flexibility and explainability.
Method demonstrated on a point-mass task.
Abstract
Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
