MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model

Seunghyeon Seo; Jaeyoung Yoo; Jihye Hwang; Nojun Kwak

arXiv:2302.08751·cs.CV·May 9, 2023·1 cites

MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model

Seunghyeon Seo, Jaeyoung Yoo, Jihye Hwang, Nojun Kwak

PDF

Open Access

TL;DR

MDPose introduces a single-stage, real-time multi-person pose estimation framework that models keypoint distributions with a mixture density approach, improving accuracy in occlusion scenarios and inference speed.

Contribution

The paper proposes MDPose, a novel mixture density model for single-stage, instance-aware pose estimation that simplifies the pipeline and enhances occlusion handling.

Findings

01

Achieves state-of-the-art performance on occlusion-heavy datasets

02

Significantly improves inference speed over previous methods

03

Successfully models high-dimensional joint distributions of keypoints

Abstract

One of the major challenges in multi-person pose estimation is instance-aware keypoint estimation. Previous methods address this problem by leveraging an off-the-shelf detector, heuristic post-grouping process or explicit instance identification process, hindering further improvements in the inference speed which is an important factor for practical applications. From the statistical point of view, those additional processes for identifying instances are necessary to bypass learning the high-dimensional joint distribution of human keypoints, which is a critical factor for another major challenge, the occlusion scenario. In this work, we propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints with a mixture density model, termed as MDPose. Our MDPose estimates the distribution of human keypoints' coordinates using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings