Lifting Embodied World Models for Planning and Control

Alex N. Wang; Trevor Darrell; Pavel Izmailov; Yutong Bai; Amir Bar

arXiv:2604.26182·cs.CV·April 30, 2026

Lifting Embodied World Models for Planning and Control

Alex N. Wang, Trevor Darrell, Pavel Izmailov, Yutong Bai, Amir Bar

PDF

TL;DR

This paper introduces a lifted world model framework that uses a lightweight policy to map high-level actions to low-level joint sequences, improving planning efficiency and accuracy for complex embodied agents.

Contribution

It proposes a novel approach combining high-level waypoint actions with a frozen world model, enabling efficient planning and better generalization in complex embodied systems.

Findings

01

Lifted world model reduces joint error by 3.8 times compared to low-level search.

02

The approach is more computationally efficient than direct low-level planning.

03

It generalizes well to unseen environments with high-level waypoint actions.

Abstract

World models of embodied agents predict future observations conditioned on an action taken by the agent. For complex embodiments, action spaces are high-dimensional and difficult to specify: for example, precisely controlling a human agent requires specifying the motion of each joint. This makes the world model hard to control and expensive to plan with as search-based methods like CEM scale poorly with action dimensionality. To address this issue, we train a lightweight policy that maps high-level actions to sequences of low-level joint actions. Composing this policy with the frozen world model produces a lifted world model that predicts a sequence of future observations from a single high-level action. We instantiate this framework for a human-like embodiment, defining the high-level action space as a small set of 2D waypoints annotated on the current observation frame, each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.