DETRPose: Real-time end-to-end transformer model for multi-person pose estimation
Sebastian Janampa, Marios Pattichis

TL;DR
This paper introduces DETRPose, a transformer-based model capable of real-time multi-person pose estimation, offering faster training, competitive inference, and high accuracy with fewer parameters.
Contribution
The paper presents a novel transformer architecture for MPPE that trains faster and achieves competitive results without quantization, improving efficiency and performance.
Findings
Faster training with 5-10x fewer epochs
Competitive inference times without quantization
Outperforms or matches state-of-the-art accuracy with fewer parameters
Abstract
Multi-person pose estimation (MPPE) estimates keypoints for all individuals present in an image. MPPE is a fundamental task for several applications in computer vision and virtual reality. Unfortunately, there are currently no transformer-based models that can perform MPPE in real time. The paper presents a family of transformer-based models capable of performing multi-person 2D pose estimation in real-time. Our approach utilizes a modified decoder architecture and keypoint similarity metrics to generate both positive and negative queries, thereby enhancing the quality of the selected queries within the architecture. Compared to state-of-the-art models, our proposed models train much faster, using 5 to 10 times fewer epochs, with competitive inference times without requiring quantization libraries to speed up the model. Furthermore, our proposed models provide competitive results or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
