Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation
Kun Zhang, Rui Wu, Ping Yao, Kai Deng, Ding Li, Renbiao Liu,, Chuanguang Yang, Ge Chen, Min Du, Tianyao Zheng

TL;DR
This paper introduces a self-supervised pretraining method called Heatmap-Style Jigsaw Puzzles (HSJP) that improves 2D human pose estimation by learning patch locations from person images without extra datasets.
Contribution
The novel HSJP pretraining task leverages patch shuffling and heatmap labels, enhancing pose estimation models without relying on ImageNet pretraining.
Findings
HSJP pretraining improves pose estimation accuracy.
Models pretrained with HSJP outperform from-scratch training.
Performance is comparable to ImageNet-based initializations.
Abstract
The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
MethodsJigsaw · Batch Normalization · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · HRNet · Convolution
