Adaptive Inverse Reinforcement Learning with Online Off-Policy Data Collection

Yibei Li; Yuexin Cao; Zhixin Liu; Lihua Xie

arXiv:2511.15171·math.OC·November 20, 2025

Adaptive Inverse Reinforcement Learning with Online Off-Policy Data Collection

Yibei Li, Yuexin Cao, Zhixin Liu, Lihua Xie

PDF

Open Access

TL;DR

This paper introduces an adaptive, model-free inverse reinforcement learning algorithm that learns cost functions online from off-policy data, applicable to nonlinear systems, with proven convergence and demonstrated effectiveness.

Contribution

It presents a novel online IRL method using Nesterov-Todd interior-point iterations that handles nonlinear systems and off-policy data without prior system knowledge.

Findings

01

Achieves sublinear convergence despite system noise.

02

Effectively generalizes to nonlinear IRL via differential dynamic programming.

03

Demonstrates efficiency and effectiveness through numerical examples.

Abstract

In this paper, the inverse reinforcement learning (IRL) problem is addressed to reconstruct the unknown cost function underlying an observed optimal policy in a model-free manner, whose online adaptation with completely off-policy system data still remains unclear in the literature. Without prior knowledge of the system model parameters, an adaptive and direct learning rule for the cost parameter is proposed using online off-policy system data, which only needs to satisfy the mild persistently exciting condition in the general data-driven paradigm. The adaptive and online IRL algorithm is achieved by designing full Nesterov-Todd (NT)-step primal-dual interior-point iterations. Despite solving a nonlinear and time-varying semi-definite program (SDP), the influence of system noise is rigorously analyzed, and the proposed online algorithm is shown to achieve a sublinear convergence. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research