BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking
Dorian F. Henning, Christopher Choi, Simon Schaefer, Stefan, Leutenegger

TL;DR
BodySLAM++ is a real-time visual-inertial system that accurately estimates human pose and camera position simultaneously, improving accuracy over previous methods and suitable for real-world applications.
Contribution
It extends existing visual-inertial frameworks to jointly estimate human and camera states with enhanced accuracy and real-time performance.
Findings
Improves human pose estimation accuracy by 26%.
Enhances camera pose accuracy by 12%.
Operates at over 15 frames per second in real-time.
Abstract
Robust, fast, and accurate human state - 6D pose and posture - estimation remains a challenging problem. For real-world applications, the ability to estimate the human state in real-time is highly desirable. In this paper, we present BodySLAM++, a fast, efficient, and accurate human and camera state estimation framework relying on visual-inertial data. BodySLAM++ extends an existing visual-inertial state estimation framework, OKVIS2, to solve the dual task of estimating camera and human states simultaneously. Our system improves the accuracy of both human and camera state estimation with respect to baseline methods by 26% and 12%, respectively, and achieves real-time performance at 15+ frames per second on an Intel i7-model CPU. Experiments were conducted on a custom dataset containing both ground truth human and camera poses collected with an indoor motion tracking system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
