Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation
Zicong Fan, Adrian Spurr, Muhammed Kocabas, Siyu Tang, Michael J., Black, Otmar Hilliges

TL;DR
This paper introduces DIGIT, a novel method that jointly estimates 3D poses of interacting hands from a single image by leveraging per-pixel part segmentation, significantly improving accuracy over previous approaches.
Contribution
The paper presents a unified approach that integrates per-pixel segmentation with pose estimation, enhancing accuracy in challenging scenarios of interacting hands.
Findings
Achieves state-of-the-art results on InterHand2.6M dataset.
Demonstrates the importance of pixel ownership modeling in hand pose estimation.
Provides detailed ablation studies validating the method's effectiveness.
Abstract
In natural conversation and interaction, our hands often overlap or are in contact with each other. Due to the homogeneous appearance of hands, this makes estimating the 3D pose of interacting hands from images difficult. In this paper we demonstrate that self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands and their parts, is a major cause of the final 3D pose error. Motivated by this insight, we propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image. The method consists of two interwoven branches that process the input imagery into a per-pixel semantic part segmentation mask and a visual feature volume. In contrast to prior work, we do not decouple the segmentation from the pose estimation stage, but rather leverage the per-pixel probabilities directly in the downstream pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Advanced Neural Network Applications
