HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning
Xiaozheng Zheng, Chao Wen, Zhou Xue, Pengfei Ren, Jingyu Wang

TL;DR
HaMuCo introduces a self-supervised multi-view learning framework for 3D hand pose estimation that reduces reliance on annotated data by leveraging cross-view consistency and collaborative learning.
Contribution
The paper proposes a novel cross-view interaction network that enhances self-supervised hand pose estimation by addressing noisy labels and groupthink effects.
Findings
Achieves state-of-the-art results on multi-view self-supervised hand pose estimation.
Outperforms previous methods in multi-view hand pose estimation tasks.
Effective in reducing annotation dependency for 3D hand pose estimation.
Abstract
Recent advancements in 3D hand pose estimation have shown promising results, but its effectiveness has primarily relied on the availability of large-scale annotated datasets, the creation of which is a laborious and costly process. To alleviate the label-hungry limitation, we propose a self-supervised learning framework, HaMuCo, that learns a single-view hand pose estimator from multi-view pseudo 2D labels. However, one of the main challenges of self-supervised learning is the presence of noisy labels and the ``groupthink'' effect from multiple views. To overcome these issues, we introduce a cross-view interaction network that distills the single-view estimator by utilizing the cross-view correlated features and enforcing multi-view consistency to achieve collaborative learning. Both the single-view estimator and the cross-view interaction network are trained jointly in an end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Robot Manipulation and Learning
