# PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation

**Authors:** Chao Lv, Geyao Ma

PMC · DOI: 10.1371/journal.pone.0326232 · PLOS One · 2025-06-25

## TL;DR

PoseNet++ improves human pose estimation by introducing new modules that enhance accuracy and efficiency, especially in complex scenarios.

## Contribution

PoseNet++ introduces three novel modules that improve accuracy and reduce model complexity for human pose estimation.

## Key findings

- PoseNet++ improves the PCKh score by 3.3% on the MPII validation set.
- The model reduces parameters and floating-point operations by 60.3% and 53.1%, respectively.
- PoseNet++ achieves state-of-the-art performance on multiple datasets with lower model complexity.

## Abstract

Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints’ regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network’s feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12192296/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12192296/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/PMC12192296/full.md

---
Source: https://tomesphere.com/paper/PMC12192296