HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin, Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding,, Lanfen Lin, Fei Wu, Jingdong Wang

TL;DR
HAP introduces a human structure-aware masked image modeling approach that leverages human parts as priors to improve pre-training for human-centric perception tasks, achieving state-of-the-art results.
Contribution
The paper proposes a novel human structure prior-guided masked image modeling method, HAP, which enhances pre-training by focusing on human parts and a structure-invariant alignment loss.
Findings
Achieves 78.1% mAP on MSMT17 for person re-identification
Attains 86.54% mA on PA-100K for pedestrian attribute recognition
Reaches 78.2% AP on MS COCO for 2D pose estimation
Abstract
Model pre-training is essential in human-centric perception. In this paper, we first introduce masked image modeling (MIM) as a pre-training approach for this task. Upon revisiting the MIM training strategy, we reveal that human structure priors offer significant potential. Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches, corresponding to human part regions, have high priority to be masked out. This encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsMutual Information Machine/Mask Image Modeling
