MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

Xijia Wei; Yuan Fang; Kevin Chetty; Youngjun Cho; and Nadia Bianchi-Berthouze

arXiv:2605.00242·cs.CV·May 4, 2026

MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

Xijia Wei, Yuan Fang, Kevin Chetty, Youngjun Cho, and Nadia Bianchi-Berthouze

PDF

TL;DR

MAEPose introduces a self-supervised, spatiotemporal learning method for human pose estimation directly from mmWave radar videos, outperforming existing approaches and reducing system complexity.

Contribution

It presents MAEPose, a novel masked autoencoding framework that learns from unlabelled radar videos and improves pose estimation accuracy without relying on intermediate representations.

Findings

01

Outperforms state-of-the-art by up to 22.1% in MPJPE

02

Maintains robustness under zero-shot bystander interference with only 6.5% error increase

03

Leverages Range-Doppler videos for better performance and lower computational cost

Abstract

Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.