Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Siliang Zeng, Mingyi Hong, Alfredo Garcia

TL;DR
This paper introduces a novel single-loop algorithm for estimating high-dimensional Markov decision processes with finite-time guarantees, improving computational efficiency and reward accuracy in inverse reinforcement learning tasks.
Contribution
The paper proposes a new single-loop estimation algorithm with finite-time convergence guarantees for high-dimensional MDPs, addressing computational complexity and reward estimation accuracy issues.
Findings
Algorithm converges to a stationary point within finite time
Performs better than existing IRL and imitation learning benchmarks
Effective in high-dimensional state spaces and transfer settings
Abstract
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
