Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha, Jihyeon Kim, Jae Shin Yoon, Seungryul Baek

TL;DR
This paper presents Text2HOI, a novel method for generating realistic 3D hand-object interactions from text prompts, addressing data scarcity by decomposing the task into contact and motion generation with a focus on physical plausibility and diversity.
Contribution
The paper introduces a two-stage approach combining a VAE-based contact generator and a Transformer-based diffusion model for text-guided 3D hand-object interaction synthesis, with a new hand refiner for stability.
Findings
Generates more realistic and diverse interactions than baselines.
Applicable to unseen objects with generalizable contact modeling.
Produces physically plausible hand-object motions from text prompts.
Abstract
This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and semantics) from text prompts. To address this challenge, we propose to decompose the interaction generation task into two subtasks: hand-object contact generation; and hand-object motion generation. For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object during the interaction. The network learns a variety of local geometry structure of diverse objects that is independent of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Hand Gesture Recognition Systems
MethodsDiffusion
