Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang

TL;DR
Faster VoxelPose introduces a re-projection technique to efficiently estimate 3D human poses in real-time from multi-camera data, significantly reducing computation while maintaining competitive accuracy.
Contribution
The paper proposes a novel re-projection approach that avoids heavy 3D-CNNs, enabling tenfold speed improvements in multi-person 3D pose estimation.
Findings
Achieves ten times faster inference than previous voxel-based methods.
Maintains competitive accuracy with state-of-the-art approaches.
Effective for real-time multi-person 3D pose estimation.
Abstract
While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. To that end, we first localize each person by a 3D bounding box by estimating a 2D box and its height based on the volume features projected to the xy-plane and z-axis, respectively. Then for each person, we estimate partial joint coordinates from the three coordinate planes separately which are then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs and improves the speed of VoxelPose by ten times and meanwhile achieves competitive accuracy as the state-of-the-art methods, proving its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Diabetic Foot Ulcer Assessment and Management
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
