Markov Balance Satisfaction Improves Performance in Strictly Batch   Offline Imitation Learning

Rishabh Agrawal; Nathan Dahlin; Rahul Jain; Ashutosh Nayyar

arXiv:2408.09125·cs.LG·August 20, 2024

Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning

Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel offline imitation learning method that leverages the Markov balance equation and conditional density estimation to improve performance in constrained, real-world scenarios without environment interaction.

Contribution

It proposes a new IL framework using Markov balance and normalizing flows, addressing limitations of existing methods in offline, interaction-free settings.

Findings

01

Outperforms state-of-the-art IL algorithms in numerical experiments

02

Demonstrates effectiveness in Classic Control and MuJoCo environments

03

Provides a robust approach for offline imitation learning without environment interaction

Abstract

Imitation learning (IL) is notably effective for robotic tasks where directly programming behaviors or defining optimal control costs is challenging. In this work, we address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. It does not have additional supplementary datasets beyond the expert's dataset nor any information about the transition dynamics. Unlike state-of-the-art (SOTA) IL methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting. Our method uses the Markov balance equation and introduces a novel conditional density estimation-based imitation learning framework. It employs conditional normalizing flows for transition dynamics estimation and aims at satisfying a balance equation for the environment. Through a series of numerical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Human Motion and Animation

MethodsNormalizing Flows