Adaptive Inverse Reinforcement Learning with Online Off-Policy Data Collection
Yibei Li, Yuexin Cao, Zhixin Liu, Lihua Xie

TL;DR
This paper introduces an adaptive, model-free inverse reinforcement learning algorithm that learns cost functions online from off-policy data, applicable to nonlinear systems, with proven convergence and demonstrated effectiveness.
Contribution
It presents a novel online IRL method using Nesterov-Todd interior-point iterations that handles nonlinear systems and off-policy data without prior system knowledge.
Findings
Achieves sublinear convergence despite system noise.
Effectively generalizes to nonlinear IRL via differential dynamic programming.
Demonstrates efficiency and effectiveness through numerical examples.
Abstract
In this paper, the inverse reinforcement learning (IRL) problem is addressed to reconstruct the unknown cost function underlying an observed optimal policy in a model-free manner, whose online adaptation with completely off-policy system data still remains unclear in the literature. Without prior knowledge of the system model parameters, an adaptive and direct learning rule for the cost parameter is proposed using online off-policy system data, which only needs to satisfy the mild persistently exciting condition in the general data-driven paradigm. The adaptive and online IRL algorithm is achieved by designing full Nesterov-Todd (NT)-step primal-dual interior-point iterations. Despite solving a nonlinear and time-varying semi-definite program (SDP), the influence of system noise is rigorously analyzed, and the proposed online algorithm is shown to achieve a sublinear convergence. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
