RAPTR: Radar-based 3D Pose Estimation using Transformer
Sorachi Kato, Ryoma Yataka, Pu Perry Wang, Pedro Miraldo, Takuya Fujihashi, Petros Boufounos

TL;DR
RAPTR introduces a radar-based 3D human pose estimation method that operates under weak supervision, using only 3D bounding boxes and 2D keypoints, and employs a two-stage transformer architecture to improve accuracy in complex indoor environments.
Contribution
The paper presents RAPTR, a novel weakly supervised radar-based 3D pose estimation framework with a two-stage transformer architecture and pseudo-3D deformable attention, reducing reliance on costly keypoint labels.
Findings
Outperforms existing methods on indoor radar datasets.
Reduces joint position error by 34.3% on HIBER.
Reduces joint position error by 76.9% on MMVR.
Abstract
Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
