Rethinking Self-Supervised Visual Representation Learning in   Pre-training for 3D Human Pose and Shape Estimation

Hongsuk Choi; Hyeongjin Nam; Taeryung Lee; Gyeongsik Moon; Kyoung Mu; Lee

arXiv:2303.05370·cs.CV·March 10, 2023·1 cites

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Hongsuk Choi, Hyeongjin Nam, Taeryung Lee, Gyeongsik Moon, Kyoung Mu, Lee

PDF

Open Access 1 Video

TL;DR

This paper critically evaluates self-supervised learning for 3D human pose estimation, revealing it underperforms compared to traditional ImageNet pre-training and highlighting the benefits of 2D annotation-based pre-training.

Contribution

The study provides empirical analysis comparing SSL with other pre-training methods for 3D human pose estimation, emphasizing the effectiveness of 2D annotation-based pre-training.

Findings

01

SSL underperforms ImageNet pre-training by 7.7% on average

02

2D annotation-based pre-training improves accuracy and convergence

03

Other data types in pre-training can be more valuable than SSL for 3DHPSE

Abstract

Recently, a few self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection. However, its effects on 3D human body pose and shape estimation (3DHPSE) are open to question, whose target is fixed to a unique class, the human, and has an inherent task gap with SSL. We empirically study and analyze the effects of SSL and further compare it with other pre-training alternatives for 3DHPSE. The alternatives are 2D annotation-based pre-training and synthetic data pre-training, which share the motivation of SSL that aims to reduce the labeling cost. They have been widely utilized as a source of weak-supervision or fine-tuning, but have not been remarked as a pre-training source. SSL methods underperform the conventional ImageNet classification pre-training on multiple 3DHPSE benchmarks by 7.7% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems