Uniqueness and Complexity of Inverse MDP Models
Marcus Hutter, Steven Hansen

TL;DR
This paper investigates the uniqueness and complexity of inverse Markov Decision Process models, exploring whether multi-step inverse models can be inferred from single-step inverse models and policies, and examines the implications for causal reasoning and reinforcement learning.
Contribution
It analyzes the conditions under which inverse models determine the full dynamics and introduces questions about their inferential and computational properties.
Findings
Inverse models may not uniquely determine forward dynamics.
Multi-step inverse models can be more complex than single-step models.
The work discusses potential algorithms for inference from inverse models.
Abstract
What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy {\pi}(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural question, analogous to the forward case is to which extent 1-step inverse model p(a|ss') plus policy {\pi}(a|s) determine the multi-step inverse models or even the whole dynamics. In other words, can forward models be inferred from inverse models or even be side-stepped. This work addresses this question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts
