Hand1000: Generating Realistic Hands from Text with Only 1,000 Images
Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao

TL;DR
Hand1000 is a novel method that generates realistic, anatomically accurate hand images from text using only 1,000 training samples, by enhancing model understanding of hand anatomy and gesture alignment.
Contribution
The paper introduces Hand1000, a new approach for text-to-hand image generation with limited data, including a three-stage training process and a new dataset for this task.
Findings
Outperforms existing models in hand image accuracy
Produces more anatomically correct hand images
Effectively aligns textual descriptions with visual hand representations
Abstract
Text-to-image generation models have achieved remarkable advancements in recent years, aiming to produce realistic images from textual descriptions. However, these models often struggle with generating anatomically accurate representations of human hands. The resulting images frequently exhibit issues such as incorrect numbers of fingers, unnatural twisting or interlacing of fingers, or blurred and indistinct hands. These issues stem from the inherent complexity of hand structures and the difficulty in aligning textual descriptions with precise visual depictions of hands. To address these challenges, we propose a novel approach named Hand1000 that enables the generation of realistic hand images with target gesture using only 1,000 training samples. The training of Hand1000 is divided into three stages with the first stage aiming to enhance the model's understanding of hand anatomy by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Hand Gesture Recognition Systems · Human Pose and Action Recognition
MethodsDiffusion
