Learning non-Markovian Decision-Making from State-only Sequences
Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie

TL;DR
This paper introduces a deep generative model for imitation learning from state-only sequences in non-Markovian decision processes, enabling decision-making as inference and demonstrating strong results in path planning and MuJoCo tasks.
Contribution
It proposes a novel energy-based latent space model for non-Markovian decision-making from state sequences, with a new inference-based decision-making framework.
Findings
Effective in path planning with non-Markovian constraints
Achieves strong performance on MuJoCo benchmarks
Enables decision-making as inference in non-Markovian settings
Abstract
Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
