Waterfall Transformer for Multi-person Pose Estimation
Navin Ranjan, Bruno Artacho, Andreas Savakis

TL;DR
The paper introduces WTPose, a transformer-based framework for multi-person pose estimation that enhances feature representation through a waterfall module, outperforming existing transformer methods on COCO.
Contribution
It presents a novel Waterfall Transformer architecture that effectively captures multi-scale features for pose estimation in a single-pass, end-to-end trainable model.
Findings
Outperforms other transformer architectures on COCO dataset
Utilizes a waterfall module for multi-scale feature generation
Enhances receptive fields and context capturing capabilities
Abstract
We propose the Waterfall Transformer architecture for Pose estimation (WTPose), a single-pass, end-to-end trainable framework designed for multi-person pose estimation. Our framework leverages a transformer-based waterfall module that generates multi-scale feature maps from various backbone stages. The module performs filtering in the cascade architecture to expand the receptive fields and to capture local and global context, therefore increasing the overall feature representation capability of the network. Our experiments on the COCO dataset demonstrate that the proposed WTPose architecture, with a modified Swin backbone and transformer-based waterfall module, outperforms other transformer architectures for multi-person pose estimation
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Gait Recognition and Analysis
MethodsAttention Is All You Need · Residual Connection · Softmax · Adam · Label Smoothing · Dropout · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding
