Hi5: Synthetic Data for Inclusive, Robust, Hand Pose Estimation
Masum Hasan, Cengiz Ozel, Nina Long, Alexander Martin, Samuel Potter, Tariq Adnan, Sangwu Lee, Ehsan Hoque

TL;DR
This paper introduces Hi5, a large synthetic hand pose dataset created with high-fidelity models to improve inclusivity, robustness, and accuracy in hand pose estimation for affective computing.
Contribution
It presents a cost-effective synthetic data generation method that enhances diversity and realism, addressing limitations of real-world datasets in hand pose estimation.
Findings
Models trained on Hi5 perform comparably to those trained on real data.
Hi5 improves robustness to occlusions and skin tone variations.
Synthetic data enables more inclusive and expressive gesture recognition.
Abstract
Hand pose estimation plays a vital role in capturing subtle nonverbal cues essential for understanding human affect. However, collecting diverse, expressive real-world data remains challenging due to labor-intensive manual annotation that often underrepresents demographic diversity and natural expressions. To address this issue, we introduce a cost-effective approach to generating synthetic data using high-fidelity 3D hand models and a wide range of affective hand poses. Our method includes varied skin tones, genders, dynamic environments, realistic lighting conditions, and diverse naturally occurring gesture animations. The resulting dataset, Hi5, contains 583,000 pose-annotated images, carefully balanced to reflect natural diversity and emotional expressiveness. Models trained exclusively on Hi5 achieve performance comparable to human-annotated datasets, exhibiting superior robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Video Analysis and Summarization
