Consistent inverse optimal control for infinite time-horizon discounted nonlinear systems under noisy observations
Ziliang Wang, Axel Ringh, Han Zhang

TL;DR
This paper introduces a robust inverse optimal control framework for infinite-horizon discounted nonlinear systems that effectively estimates underlying costs from noisy observations using convex optimization and moment-based methods.
Contribution
It extends previous IOC methods to handle noisy data in infinite-horizon discounted MDPs with weak Feller transition kernels, using occupation measures and GMM estimators.
Findings
Method is statistically consistent and asymptotically optimal.
Convex optimization approach reduces local minima issues.
Numerical examples demonstrate effectiveness under noise.
Abstract
Inverse optimal control (IOC) aims to estimate the underlying cost that governs the observed behavior of an expert system. However, in practical scenarios, the collected data is often corrupted by noise, which poses significant challenges for accurate cost function recovery. In this work, we propose an IOC framework that effectively addresses the presence of observation noise. In particular, compared to our previous work \cite{wang2025consistent}, we consider the case of discrete-time, infinite-horizon, discounted MDPs whose transition kernel is only weak Feller. By leveraging the occupation measure framework, we first establish the necessary and sufficient optimality conditions for the expert policy and then construct an infinite dimensional optimization problem based on these conditions. This problem is then approximated by polynomials to get a finite-dimensional numerically solvable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Control Systems and Identification · Adaptive Dynamic Programming Control
