Reward-free World Models for Online Imitation Learning
Shangzhe Li, Zhiao Huang, Hao Su

TL;DR
This paper introduces a reward-free world model approach for online imitation learning that models environment dynamics in latent space, improving stability and performance in complex high-dimensional tasks.
Contribution
It proposes a novel method using latent space dynamics and inverse soft-Q learning to enhance stability and effectiveness in online imitation learning for complex environments.
Findings
Achieves stable, expert-level performance in high-dimensional tasks
Outperforms existing methods on benchmarks like DMControl, MyoSuite, and ManiSkill2
Demonstrates the effectiveness of reward-free latent dynamics modeling
Abstract
Imitation learning (IL) enables agents to acquire skills directly from expert demonstrations, providing a compelling alternative to reinforcement learning. However, prior online IL approaches struggle with complex tasks characterized by high-dimensional inputs and complex dynamics. In this work, we propose a novel approach to online imitation learning that leverages reward-free world models. Our method learns environmental dynamics entirely in latent spaces without reconstruction, enabling efficient and accurate modeling. We adopt the inverse soft-Q learning objective, reformulating the optimization process in the Q-policy space to mitigate the instability associated with traditional optimization in the reward-policy space. By employing a learned latent dynamics model and planning for control, our approach consistently achieves stable, expert-level performance in tasks with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
MethodsSparse Evolutionary Training
