Stereo-Inertial Poser: Towards Metric-Accurate Shape-Aware Motion Capture Using Sparse IMUs and a Single Stereo Camera
Tutian Tang, Xingyu Ji, Yutong Li, MingHao Liu, Wenqiang Xu, Cewu Lu

TL;DR
Stereo-Inertial Poser is a real-time, shape-aware motion capture system that combines stereo vision and IMUs to achieve metric-accurate 3D human motion estimation, overcoming depth ambiguity and anthropometric variations.
Contribution
It introduces a novel fusion of stereo vision and IMUs with a shape-aware module for accurate, real-time 3D human motion capture.
Findings
Achieves over 200 FPS in real-time.
Demonstrates state-of-the-art accuracy on various datasets.
Produces drift-free global translation and reduces foot-skating effects.
Abstract
Recent advancements in visual-inertial motion capture systems have demonstrated the potential of combining monocular cameras with sparse inertial measurement units (IMUs) as cost-effective solutions, which effectively mitigate occlusion and drift issues inherent in single-modality systems. However, they are still limited by metric inaccuracies in global translations stemming from monocular depth ambiguity, and shape-agnostic local motion estimations that ignore anthropometric variations. We present Stereo-Inertial Poser, a real-time motion capture system that leverages a single stereo camera and six IMUs to estimate metric-accurate and shape-aware 3D human motion. By replacing the monocular RGB with stereo vision, our system resolves depth ambiguity through calibrated baseline geometry, enabling direct 3D keypoint extraction and body shape parameter estimation. IMU data and visual cues…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Balance, Gait, and Falls Prevention · Robotics and Sensor-Based Localization
