# HEViTPose: towards high-accuracy and efficient 2D human pose estimation with cascaded group spatial reduction attention

**Authors:** Chengpeng Wu, Zhidong Chen, Beihua Ying, Guangxing Tan, Bing Hu, Chunyu Li, Haifeng Chen

PMC · DOI: 10.1038/s41598-026-35859-x · Scientific Reports · 2026-01-17

## TL;DR

This paper introduces HEViTPose, a lightweight and efficient vision transformer for 2D human pose estimation that matches top performance while reducing computational costs.

## Contribution

Proposes HEViTPose with novel PEOW and CGSR-MHA mechanisms for efficient and accurate human pose estimation.

## Key findings

- HEViTPose reduces parameters by 62.1% and computation by 43.4% compared to HRNet with similar performance.
- The model is 2.6 times faster than HRFormer with similar performance and network size.
- Experiments on MPII and COCO show HEViTPose matches state-of-the-art accuracy while being more efficient.

## Abstract

Transformer-based human pose estimation methods have made encouraging progress in improving performance. However, the excellent performance of pose networks is often accompanied by heavy computational costs and large network scale. In order to deal with this problem, this paper proposes a High-accuracy and Efficient Vision Transformer for Human Pose Estimation (HEViTPose). Firstly, the concept of Patch Embedded Overlap Width (PEOW) is proposed to help understand the relationship between the amount of overlap and local continuity. By explicitly adjusting PEOW value, the model’s capacity to capture local continuity information is enhanced. Secondly, a Cascaded Group Spatial Reduction Multi-Head Attention (CGSR-MHA) is proposed, which improves memory efficiency through feature grouping, reduces computational cost through spatial reduction, and also improves network performance by retaining multiple low-dimensional attention heads. Finally, comprehensive experiments on two benchmark datasets (MPII and COCO) demonstrate that the HEViTPose model performs on par with the state-of-the-art models, but is more lightweight while possessing higher inference speed. Specifically, compared with HRNet with similar performance and inference speed, the proposed model reduces the number of parameters by 62.1% and the amount of computation by 43.4%. Compared with HRFormer with similar performance and network size, the inference speed is about 2.6 times faster. Code and models are available at https://github.com/ T1sweet/HEViTPose.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12890907/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12890907/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12890907/full.md

---
Source: https://tomesphere.com/paper/PMC12890907