Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose   Estimation

Hansoo Park; Chanwoo Kim; Jihyeon Kim; Hoseong Cho; Nhat Nguyen Bao; Truong; Taehwan Kim; Seungryul Baek

arXiv:2501.08408·cs.CV·February 26, 2025

Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation

Hansoo Park, Chanwoo Kim, Jihyeon Kim, Hoseong Cho, Nhat Nguyen Bao, Truong, Taehwan Kim, Seungryul Baek

PDF

Open Access

TL;DR

This paper introduces an unsupervised domain adaptation framework for 3D pose estimation using masked image modeling, improving accuracy across diverse datasets without requiring labeled data for new domains.

Contribution

It proposes a novel domain adaptation method leveraging masked image modeling and attention regularization to enhance 3D pose estimation across different data distributions.

Findings

01

Achieved state-of-the-art accuracy on multiple datasets.

02

Effective in cross-domain human and hand pose estimation.

03

Utilized unlabeled data to improve model robustness.

Abstract

RGB-based 3D pose estimation methods have been successful with the development of deep learning and the emergence of high-quality 3D pose datasets. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. This problem might be alleviated by involving diverse data during training, however it is non-trivial to collect such diverse data with corresponding labels (i.e. 3D pose). In this paper, we introduced an unsupervised domain adaptation framework for 3D pose estimation that utilizes the unlabeled data in addition to labeled data via masked image modeling (MIM) framework. Foreground-centric reconstruction and attention regularization are further proposed to increase the effectiveness of unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition

MethodsSoftmax · Attention Is All You Need