Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation

Zenghao Zheng; Lianping Yang; Hegui Zhu; Mingrui Ye

arXiv:2505.21309·cs.CV·October 10, 2025

Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation

Zenghao Zheng, Lianping Yang, Hegui Zhu, Mingrui Ye

PDF

TL;DR

This paper introduces a Spectral Compression Transformer with Line Pose Graph to efficiently estimate 3D human poses from monocular images, reducing redundancy and computational costs while achieving state-of-the-art accuracy.

Contribution

The paper proposes a novel spectral compression technique and a line pose graph to enhance transformer-based 3D human pose estimation, improving efficiency and performance.

Findings

01

Achieves state-of-the-art MPJPE of 37.7mm on Human3.6M

02

Reduces computational cost through spectral sequence compression

03

Enriches input with skeletal structure via Line Pose Graph

Abstract

Transformer-based 3D human pose estimation methods suffer from high computational costs due to the quadratic complexity of self-attention with respect to sequence length. Additionally, pose sequences often contain significant redundancy between frames. However, recent methods typically fail to improve model capacity while effectively eliminating sequence redundancy. In this work, we introduce the Spectral Compression Transformer (SCT) to reduce sequence length and accelerate computation. The SCT encoder treats hidden features between blocks as Temporal Feature Signals (TFS) and applies the Discrete Cosine Transform, a Fourier transform-based technique, to determine the spectral components to be retained. By filtering out certain high-frequency noise components, SCT compresses the sequence length and reduces redundancy. To further enrich the input sequence with prior structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.