SPiKE: 3D Human Pose from Point Cloud Sequences

Irene Ballester; Ond\v{r}ej Peterka; Martin Kampel

arXiv:2409.01879·cs.CV·September 4, 2024

SPiKE: 3D Human Pose from Point Cloud Sequences

Irene Ballester, Ond\v{r}ej Peterka, Martin Kampel

PDF

Open Access 1 Repo

TL;DR

SPiKE introduces a Transformer-based method for 3D human pose estimation from point cloud sequences, effectively leveraging temporal information to improve accuracy and efficiency over existing single-frame approaches.

Contribution

It is the first to utilize a Transformer architecture for spatio-temporal encoding in 3D human pose estimation from point cloud sequences, achieving state-of-the-art results.

Findings

01

Achieves 89.19% mAP on ITOP benchmark.

02

Outperforms existing methods in accuracy and inference speed.

03

Validates the importance of temporal context in 3D HPE.

Abstract

3D Human Pose Estimation (HPE) is the task of locating keypoints of the human body in 3D space from 2D or 3D representations such as RGB images, depth maps or point clouds. Current HPE methods from depth and point clouds predominantly rely on single-frame estimation and do not exploit temporal information from sequences. This paper presents SPiKE, a novel approach to 3D HPE using point cloud sequences. Unlike existing methods that process frames of a sequence independently, SPiKE leverages temporal context by adopting a Transformer architecture to encode spatio-temporal relationships between points across the sequence. By partitioning the point cloud into local volumes and using spatial feature extraction via point spatial convolution, SPiKE ensures efficient processing by the Transformer while preserving spatial integrity per timestamp. Experiments on the ITOP benchmark for 3D HPE show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iballester/SPiKE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Gait Recognition and Analysis

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization · Dense Connections