Efficient Multi-Person Pose Estimation with Provable Guarantees
Shaofei Wang, Konrad Paul Kording, Julian Yarkony

TL;DR
This paper presents a novel bottom-up method for multi-person pose estimation that formulates the problem as a set packing task, employing an efficient algorithm with provable guarantees to achieve near-optimal solutions.
Contribution
It introduces a new algorithm combining implicit column generation and nested Bender's decomposition to efficiently solve the MWSP problem with provable bounds for MPPE.
Findings
Achieves comparable accuracy to state-of-the-art methods on MPII-Multiperson dataset.
Provides globally optimal solutions for over 99% of instances, with bounds for others.
Speeds up inference significantly compared to naive dynamic programming.
Abstract
Multi-person pose estimation (MPPE) in natural images is key to the meaningful use of visual data in many fields including movement science, security, and rehabilitation. In this paper we tackle MPPE with a bottom-up approach, starting with candidate detections of body parts from a convolutional neural network (CNN) and grouping them into people. We formulate the grouping of body part detections into people as a minimum-weight set packing (MWSP) problem where the set of potential people is the power set of body part detections. We model the quality of a hypothesis of a person which is a set in the MWSP by an augmented tree-structured Markov random field where variables correspond to body-parts and their state-spaces correspond to the power set of the detections for that part. We describe a novel algorithm that combines efficiency with provable bounds on this MWSP problem. We employ an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
