ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and   Generalizable Grasping

Youxin Pang; Ruizhi Shao; Jiajun Zhang; Hanzhang Tu; Yun Liu; Boyao; Zhou; Hongwen Zhang; Yebin Liu

arXiv:2412.16212·cs.CV·December 24, 2024

ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping

Youxin Pang, Ruizhi Shao, Jiajun Zhang, Hanzhang Tu, Yun Liu, Boyao, Zhou, Hongwen Zhang, Yebin Liu

PDF

Open Access

TL;DR

ManiVideo is a new method that generates realistic, temporally consistent videos of hand-object manipulation by learning 3D occlusion relationships and leveraging large-scale 3D object datasets for better generalization.

Contribution

The paper introduces the MLO representation and integrates it into a UNet architecture, enabling improved 3D consistency and generalizable grasping in manipulation videos.

Findings

01

Outperforms existing state-of-the-art methods.

02

Produces videos with plausible hand-object interactions.

03

Demonstrates effective generalization to various objects.

Abstract

In this paper, we introduce ManiVideo, a novel method for generating consistent and temporally coherent bimanual hand-object manipulation videos from given motion sequences of hands and objects. The core idea of ManiVideo is the construction of a multi-layer occlusion (MLO) representation that learns 3D occlusion relationships from occlusion-free normal maps and occlusion confidence maps. By embedding the MLO structure into the UNet in two forms, the model enhances the 3D consistency of dexterous hand-object manipulation. To further achieve the generalizable grasping of objects, we integrate Objaverse, a large-scale 3D object dataset, to address the scarcity of video data, thereby facilitating the learning of extensive object consistency. Additionally, we propose an innovative training strategy that effectively integrates multiple datasets, supporting downstream tasks such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation