BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations
Bruno Artacho, Andreas Savakis

TL;DR
BAPose introduces a bottom-up multi-person pose estimation framework using disentangled multi-scale waterfall architecture and adaptive convolutions, achieving state-of-the-art accuracy especially in crowded scenes.
Contribution
The paper presents a novel end-to-end trainable bottom-up approach with a disentangled waterfall module and adaptive convolutions for improved multi-person pose estimation.
Findings
Achieves state-of-the-art results on COCO and CrowdPose datasets.
Effectively handles occlusions and crowded scenes.
Demonstrates robustness and efficiency in multi-person pose estimation.
Abstract
We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multi-scale representations, obtained by the disentangled waterfall module in BAPose, leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, achieving significant improvements on state-of-the-art accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
