Lifting Embodied World Models for Planning and Control
Alex N. Wang, Trevor Darrell, Pavel Izmailov, Yutong Bai, Amir Bar

TL;DR
This paper introduces a lifted world model framework that uses a lightweight policy to map high-level actions to low-level joint sequences, improving planning efficiency and accuracy for complex embodied agents.
Contribution
It proposes a novel approach combining high-level waypoint actions with a frozen world model, enabling efficient planning and better generalization in complex embodied systems.
Findings
Lifted world model reduces joint error by 3.8 times compared to low-level search.
The approach is more computationally efficient than direct low-level planning.
It generalizes well to unseen environments with high-level waypoint actions.
Abstract
World models of embodied agents predict future observations conditioned on an action taken by the agent. For complex embodiments, action spaces are high-dimensional and difficult to specify: for example, precisely controlling a human agent requires specifying the motion of each joint. This makes the world model hard to control and expensive to plan with as search-based methods like CEM scale poorly with action dimensionality. To address this issue, we train a lightweight policy that maps high-level actions to sequences of low-level joint actions. Composing this policy with the frozen world model produces a lifted world model that predicts a sequence of future observations from a single high-level action. We instantiate this framework for a human-like embodiment, defining the high-level action space as a small set of 2D waypoints annotated on the current observation frame, each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
