Structural Estimation of Markov Decision Processes in High-Dimensional   State Space with Finite-Time Guarantees

Siliang Zeng; Mingyi Hong; Alfredo Garcia

arXiv:2210.01282·cs.LG·March 4, 2024·1 cites

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Siliang Zeng, Mingyi Hong, Alfredo Garcia

PDF

Open Access

TL;DR

This paper introduces a novel single-loop algorithm for estimating high-dimensional Markov decision processes with finite-time guarantees, improving computational efficiency and reward accuracy in inverse reinforcement learning tasks.

Contribution

The paper proposes a new single-loop estimation algorithm with finite-time convergence guarantees for high-dimensional MDPs, addressing computational complexity and reward estimation accuracy issues.

Findings

01

Algorithm converges to a stationary point within finite time

02

Performs better than existing IRL and imitation learning benchmarks

03

Effective in high-dimensional state spaces and transfer settings

Abstract

We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics