RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
Tao Jiang, Xinchen Xie, Yining Li

TL;DR
RTMW introduces high-performance, real-time multi-person 2D and 3D whole-body pose estimation models that outperform existing benchmarks while maintaining efficiency, with open-source code and models for broad application.
Contribution
This work presents RTMW, a novel series of models that achieve state-of-the-art accuracy in 2D/3D whole-body pose estimation with a new architecture and training strategy.
Findings
RTMW-l achieves 70.2 mAP on COCO-Wholebody benchmark.
Models demonstrate strong performance across multiple benchmarks.
High inference efficiency and deployment friendliness are maintained.
Abstract
Whole-body pose estimation is a challenging task that requires simultaneous prediction of keypoints for the body, hands, face, and feet. Whole-body pose estimation aims to predict fine-grained pose information for the human body, including the face, torso, hands, and feet, which plays an important role in the study of human-centric perception and generation and in various applications. In this work, we present RTMW (Real-Time Multi-person Whole-body pose estimation models), a series of high-performance models for 2D/3D whole-body pose estimation. We incorporate RTMPose model architecture with FPN and HEM (Hierarchical Encoding Module) to better capture pose information from different body parts with various scales. The model is trained with a rich collection of open-source human keypoint datasets with manually aligned annotations and further enhanced via a two-stage distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Human Motion and Animation
MethodsConvolution · 1x1 Convolution · Feature Pyramid Network
