Offline Reinforcement Learning via Inverse Optimization
Ioannis Dimanidis, Tolga Ok, Peyman Mohajerin Esfahani

TL;DR
This paper introduces a novel offline reinforcement learning algorithm that leverages inverse optimization and robust model predictive control to improve performance in continuous spaces, achieving competitive results with fewer parameters.
Contribution
It proposes a new offline RL method combining inverse optimization with a convex reformulation of robust MPC, enhancing expressiveness and sample efficiency.
Findings
Reliable recovery of teacher behavior in MuJoCo benchmarks
Achieves competitive results with significantly fewer parameters
Provides an open-source implementation for reproducibility
Abstract
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss'' from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and {reliably recovers teacher behavior in MuJoCo benchmarks. The method achieves competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
