Bridging the gap between QP-based and MPC-based RL

Shambhuraj Sawant; Sebastien Gros

arXiv:2205.08856·eess.SY·May 19, 2022·1 cites

Bridging the gap between QP-based and MPC-based RL

Shambhuraj Sawant, Sebastien Gros

PDF

Open Access

TL;DR

This paper introduces a method that combines the flexibility of QP-based reinforcement learning with the interpretability of MPC schemes, enabling a trade-off between learning capacity and explainability.

Contribution

It proposes tools to structure QPs to resemble MPC schemes, enhancing policy explainability while maintaining learning flexibility.

Findings

01

Structured QPs improve policy interpretability.

02

Trade-off tools allow balancing flexibility and explainability.

03

Method demonstrated on a point-mass task.

Abstract

Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms