# DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation

**Authors:** Matteo Fincato, Roberto Vezzani

PMC · DOI: 10.3390/s25102997 · Sensors (Basel, Switzerland) · 2025-05-09

## TL;DR

DualPose is a new method for multi-person pose estimation that uses a dual-block transformer decoder and contrastive denoising to improve accuracy and robustness.

## Contribution

Introduces DualPose, a dual-block transformer decoder with contrastive denoising for better multi-person pose estimation.

## Key findings

- DualPose outperforms recent end-to-end methods on MS COCO and CrowdPose datasets.
- The dual-block architecture improves keypoint localization and classification accuracy.
- Contrastive denoising enhances model robustness by using positive and negative samples.

## Abstract

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model’s capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model’s ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), APL (MESH:D015473), occlusions (MESH:D001157)
- **Chemicals:** A40 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12114973/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12114973/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12114973/full.md

---
Source: https://tomesphere.com/paper/PMC12114973