Multi-Scale Supervised Network for Human Pose Estimation
Lipeng Ke, Ming-Ching Chang, Honggang Qi, Siwei Lyu

TL;DR
This paper introduces a multi-scale supervised deep network for human pose estimation that improves keypoint detection accuracy and structural consistency, effectively handling occlusions and ambiguous matches.
Contribution
It proposes a novel multi-scale supervision and global regression framework that enhances pose estimation robustness and accuracy over previous methods.
Findings
Achieves competitive results on MPII and FLIC datasets.
Effectively disambiguates close keypoints and handles occlusions.
Improves structural consistency of estimated poses.
Abstract
Human pose estimation is an important topic in computer vision with many applications including gesture and activity recognition. However, pose estimation from image is challenging due to appearance variations, occlusions, clutter background, and complex activities. To alleviate these problems, we develop a robust pose estimation method based on the recent deep conv-deconv modules with two improvements: (1) multi-scale supervision of body keypoints, and (2) a global regression to improve structural consistency of keypoints. We refine keypoint detection heatmaps using layer-wise multi-scale supervision to better capture local contexts. Pose inference via keypoint association is optimized globally using a regression network at the end. Our method can effectively disambiguate keypoint matches in close proximity including the mismatch of left-right body parts, and better infer occluded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
