SPiKE: 3D Human Pose from Point Cloud Sequences
Irene Ballester, Ond\v{r}ej Peterka, Martin Kampel

TL;DR
SPiKE introduces a Transformer-based method for 3D human pose estimation from point cloud sequences, effectively leveraging temporal information to improve accuracy and efficiency over existing single-frame approaches.
Contribution
It is the first to utilize a Transformer architecture for spatio-temporal encoding in 3D human pose estimation from point cloud sequences, achieving state-of-the-art results.
Findings
Achieves 89.19% mAP on ITOP benchmark.
Outperforms existing methods in accuracy and inference speed.
Validates the importance of temporal context in 3D HPE.
Abstract
3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds. Current HPE methods from depth and point clouds predominantly rely on single-frame estimation and do not exploit temporal information from sequences. This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences. Unlike existing methods that process frames of a sequence independently, SPiKE leverages temporal context by adopting a Transformer architecture to encode spatio-temporal relationships between points across the sequence. By partitioning the point cloud into local volumes and using spatial feature extraction via point spatial convolution, SPiKE ensures efficient processing by the Transformer while preserving spatial integrity per timestamp. Experiments on the ITOP benchmark for 3D HPE show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Gait Recognition and Analysis
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization · Dense Connections
