Offline Reinforcement Learning via Inverse Optimization

Ioannis Dimanidis; Tolga Ok; Peyman Mohajerin Esfahani

arXiv:2502.20030·cs.LG·March 19, 2026

Offline Reinforcement Learning via Inverse Optimization

Ioannis Dimanidis, Tolga Ok, Peyman Mohajerin Esfahani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel offline reinforcement learning algorithm that leverages inverse optimization and robust model predictive control to improve performance in continuous spaces, achieving competitive results with fewer parameters.

Contribution

It proposes a new offline RL method combining inverse optimization with a convex reformulation of robust MPC, enhancing expressiveness and sample efficiency.

Findings

01

Reliable recovery of teacher behavior in MuJoCo benchmarks

02

Achieves competitive results with significantly fewer parameters

03

Provides an open-source implementation for reproducibility

Abstract

Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss'' from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and {reliably recovers teacher behavior in MuJoCo benchmarks. The method achieves competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tolgaok/offlinerlviaio
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control