Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation
Zenghao Zheng, Lianping Yang, Hegui Zhu, Mingrui Ye

TL;DR
This paper introduces a Spectral Compression Transformer with Line Pose Graph to efficiently estimate 3D human poses from monocular images, reducing redundancy and computational costs while achieving state-of-the-art accuracy.
Contribution
The paper proposes a novel spectral compression technique and a line pose graph to enhance transformer-based 3D human pose estimation, improving efficiency and performance.
Findings
Achieves state-of-the-art MPJPE of 37.7mm on Human3.6M
Reduces computational cost through spectral sequence compression
Enriches input with skeletal structure via Line Pose Graph
Abstract
Transformer-based 3D human pose estimation methods suffer from high computational costs due to the quadratic complexity of self-attention with respect to sequence length. Additionally, pose sequences often contain significant redundancy between frames. However, recent methods typically fail to improve model capacity while effectively eliminating sequence redundancy. In this work, we introduce the Spectral Compression Transformer (SCT) to reduce sequence length and accelerate computation. The SCT encoder treats hidden features between blocks as Temporal Feature Signals (TFS) and applies the Discrete Cosine Transform, a Fourier transform-based technique, to determine the spectral components to be retained. By filtering out certain high-frequency noise components, SCT compresses the sequence length and reduces redundancy. To further enrich the input sequence with prior structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
