Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning
Jared Town, Zachary Morrison, Rushikesh Kamalapurkar

TL;DR
This paper introduces an online, real-time observer-based method for inverse reinforcement learning that addresses the challenge of nonuniqueness by converging to approximately equivalent solutions, supported by new data-richness conditions and simulation results.
Contribution
It presents a novel regularized history stack observer for IRL that converges to equivalent solutions in real-time, filling a gap in existing offline-focused methods.
Findings
The proposed method converges to approximately equivalent IRL solutions.
New data-richness conditions enable convergence analysis.
Simulation results demonstrate the effectiveness of the approach.
Abstract
A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · Piezoelectric Actuators and Control · Adaptive Dynamic Programming Control
