LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model

Quankai Gao; Jiawei Yang; Qiangeng Xu; Le Chen; Yue Wang

arXiv:2603.27449·cs.CV·March 31, 2026

LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model

Quankai Gao, Jiawei Yang, Qiangeng Xu, Le Chen, Yue Wang

PDF

1 Repo

TL;DR

LOME is a novel egocentric world model that generates realistic human-object interaction videos conditioned on images, text, and actions, enabling better generalization and physical realism in manipulation tasks.

Contribution

It introduces a training method that jointly estimates human actions and environment context, improving action-following accuracy and physical realism in generated videos.

Findings

01

LOME outperforms state-of-the-art methods in temporal consistency and motion control.

02

The model generalizes well to unseen scenarios.

03

LOME produces realistic physical effects like liquid flow during pouring.

Abstract

Learning human-object manipulation presents significant challenges due to its fine-grained and contact-rich nature of the motions involved. Traditional physics-based animation requires extensive modeling and manual setup, and more importantly, it neither generalizes well across diverse object morphologies nor scales effectively to real-world environment. To address these limitations, we introduce LOME, an egocentric world model that can generate realistic human-object interactions as videos conditioned on an input image, a text prompt, and per-frame human actions, including both body poses and hand gestures. LOME injects strong and precise action guidance into object manipulation by jointly estimating spatial human actions and the environment contexts during training. After finetuning a pretrained video generative model on videos of diverse egocentric human-object interactions, LOME…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zerg-overmind/LOME
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.