Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar

TL;DR
This paper introduces a novel offline imitation learning method that leverages the Markov balance equation and conditional density estimation to improve performance in constrained, real-world scenarios without environment interaction.
Contribution
It proposes a new IL framework using Markov balance and normalizing flows, addressing limitations of existing methods in offline, interaction-free settings.
Findings
Outperforms state-of-the-art IL algorithms in numerical experiments
Demonstrates effectiveness in Classic Control and MuJoCo environments
Provides a robust approach for offline imitation learning without environment interaction
Abstract
Imitation learning (IL) is notably effective for robotic tasks where directly programming behaviors or defining optimal control costs is challenging. In this work, we address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. It does not have additional supplementary datasets beyond the expert's dataset nor any information about the transition dynamics. Unlike state-of-the-art (SOTA) IL methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting. Our method uses the Markov balance equation and introduces a novel conditional density estimation-based imitation learning framework. It employs conditional normalizing flows for transition dynamics estimation and aims at satisfying a balance equation for the environment. Through a series of numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Human Motion and Animation
MethodsNormalizing Flows
