Learning non-Markovian Decision-Making from State-only Sequences

Aoyang Qin; Feng Gao; Qing Li; Song-Chun Zhu; Sirui Xie

arXiv:2306.15156·cs.LG·October 31, 2023·2 cites

Learning non-Markovian Decision-Making from State-only Sequences

Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a deep generative model for imitation learning from state-only sequences in non-Markovian decision processes, enabling decision-making as inference and demonstrating strong results in path planning and MuJoCo tasks.

Contribution

It proposes a novel energy-based latent space model for non-Markovian decision-making from state sequences, with a new inference-based decision-making framework.

Findings

01

Effective in path planning with non-Markovian constraints

02

Achieves strong performance on MuJoCo benchmarks

03

Enables decision-making as inference in non-Markovian settings

Abstract

Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qayqaq/lanmdp
pytorch

Videos

Learning non-Markovian Decision-Making from State-only Sequences· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning