L2A: Learning to Accumulate Pose History for Accurate 3D Human Pose Estimation
Zehua Wang, Changwang Mei, Huaijiang Sun, Pengqi Hu, Zhaoyang Yin

TL;DR
This paper introduces a history-aware Transformer framework with novel pose accumulation modules to improve 3D human pose estimation by effectively utilizing cross-layer pose history.
Contribution
It proposes a new framework that maintains a consistent representation space and adaptively aggregates historical pose features across layers for enhanced accuracy.
Findings
Achieves state-of-the-art results on benchmark datasets.
Effectively utilizes cross-layer pose history for better 3D pose estimation.
Introduces novel modules for structured pose feature aggregation.
Abstract
Existing 2D-3D lifting human pose estimation methods have achieved strong performance. But the utilization of historical pose representations across network depth was overlooked. In current pipelines, information is propagated through fixed residual connections, which restricts effective reuse of early-layer features such as fine-grained spatial structures and short-term motion cues. However, naively incorporating historical features across layers is non-trivial. We further identify that maintaining a consistent representation space across layers is a prerequisite for effective cross-layer feature aggregation. To address this issue, we propose a history-aware framework that enables effective network cross-layer history feature utilization. Specifically, we adopt a spatial-temporal parallel Transformer backbone to prevent alternating spatial-temporal transformations during sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
