MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model
Seunghyeon Seo, Jaeyoung Yoo, Jihye Hwang, Nojun Kwak

TL;DR
MDPose introduces a single-stage, real-time multi-person pose estimation framework that models keypoint distributions with a mixture density approach, improving accuracy in occlusion scenarios and inference speed.
Contribution
The paper proposes MDPose, a novel mixture density model for single-stage, instance-aware pose estimation that simplifies the pipeline and enhances occlusion handling.
Findings
Achieves state-of-the-art performance on occlusion-heavy datasets
Significantly improves inference speed over previous methods
Successfully models high-dimensional joint distributions of keypoints
Abstract
One of the major challenges in multi-person pose estimation is instance-aware keypoint estimation. Previous methods address this problem by leveraging an off-the-shelf detector, heuristic post-grouping process or explicit instance identification process, hindering further improvements in the inference speed which is an important factor for practical applications. From the statistical point of view, those additional processes for identifying instances are necessary to bypass learning the high-dimensional joint distribution of human keypoints, which is a critical factor for another major challenge, the occlusion scenario. In this work, we propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints with a mixture density model, termed as MDPose. Our MDPose estimates the distribution of human keypoints' coordinates using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
