DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Sebastian Janampa; Marios Pattichis

arXiv:2506.13027·cs.CV·June 17, 2025

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Sebastian Janampa, Marios Pattichis

PDF

Open Access 1 Repo

TL;DR

This paper introduces DETRPose, a transformer-based model capable of real-time multi-person pose estimation, offering faster training, competitive inference, and high accuracy with fewer parameters.

Contribution

The paper presents a novel transformer architecture for MPPE that trains faster and achieves competitive results without quantization, improving efficiency and performance.

Findings

01

Faster training with 5-10x fewer epochs

02

Competitive inference times without quantization

03

Outperforms or matches state-of-the-art accuracy with fewer parameters

Abstract

Multi-person pose estimation (MPPE) estimates keypoints for all individuals present in an image. MPPE is a fundamental task for several applications in computer vision and virtual reality. Unfortunately, there are currently no transformer-based models that can perform MPPE in real time. The paper presents a family of transformer-based models capable of performing multi-person 2D pose estimation in real-time. Our approach utilizes a modified decoder architecture and keypoint similarity metrics to generate both positive and negative queries, thereby enhancing the quality of the selected queries within the architecture. Compared to state-of-the-art models, our proposed models train much faster, using 5 to 10 times fewer epochs, with competitive inference times without requiring quantization libraries to speed up the model. Furthermore, our proposed models provide competitive results or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SebastianJanampa/DETRPose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings