Contrastive Representation Learning for Hand Shape Estimation
Christian Zimmermann, Max Argus, Thomas Brox

TL;DR
This paper enhances monocular hand shape estimation by leveraging contrastive learning with a new dataset, HanCo, and multi-view data, achieving significant accuracy improvements over baseline methods.
Contribution
It introduces HanCo, a structured hand image dataset, and demonstrates how contrastive learning with background removal and multi-view data improves hand shape estimation.
Findings
4.7% reduction in mesh error
3.6% improvement in F-score
Enhanced representation quality for hand shape estimation
Abstract
This work presents improvements in monocular hand shape estimation by building on top of recent advances in unsupervised learning. We extend momentum contrastive learning and contribute a structured collection of hand images, well suited for visual representation learning, which we call HanCo. We find that the representation learned by established contrastive learning methods can be improved significantly by exploiting advanced background removal techniques and multi-view information. These allow us to generate more diverse instance pairs than those obtained by augmentations commonly used in exemplar based approaches. Our method leads to a more suitable representation for the hand shape estimation task and shows a 4.7% reduction in mesh error and a 3.6% improvement in F-score compared to an ImageNet pretrained baseline. We make our benchmark dataset publicly available, to encourage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
