HANDI: Hand-Centric Text-and-Image Conditioned Video Generation

Yayuan Li; Zhi Cao; Jason J. Corso

arXiv:2412.04189·cs.CV·July 15, 2025

HANDI: Hand-Centric Text-and-Image Conditioned Video Generation

Yayuan Li, Zhi Cao, Jason J. Corso

PDF

Open Access

TL;DR

HANDI is a novel diffusion-based video generation method that emphasizes hand-centric actions, automatically identifying motion regions and refining hand poses to improve action clarity in complex environments.

Contribution

The paper introduces an automatic motion area generation method guided by visual context and text prompts, along with a Hand Refinement Loss for better hand pose consistency.

Findings

01

Significant improvement in hand motion clarity over state-of-the-art methods

02

Effective on challenging datasets like EpicKitchens and Ego4D

03

Demonstrates robustness across diverse environments and actions

Abstract

Despite the recent strides in video generation, state-of-the-art methods still struggle with elements of visual detail. One particularly challenging case is the class of videos in which the intricate motion of the hand coupled with a mostly stable and otherwise distracting environment is necessary to convey the execution of some complex action and its effects. To address these challenges, we introduce a new method for video generation that focuses on hand-centric actions. Our diffusion-based method incorporates two distinct innovations. First, we propose an automatic method to generate the motion area -- the region in the video in which the detailed activities occur -- guided by both the visual context and the action text prompt, rather than assuming this region can be provided manually as is now commonplace. Second, we introduce a critical Hand Refinement Loss to guide the diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Educational Tools and Methods · Video Analysis and Summarization

MethodsDiffusion · Focus