Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture
Shaohua Pan, Qi Ma, Xinyu Yi, Weifeng Hu, Xiong Wang, Xingkang Zhou,, Jijunnan Li, and Feng Xu

TL;DR
This paper introduces a novel real-time human motion capture method that fuses monocular images with sparse IMU signals, leveraging dual coordinate strategies and feedback mechanisms to improve robustness and accuracy over existing approaches.
Contribution
It proposes a dual-branch fusion framework with coordinate transformations and feedback to effectively combine visual and inertial data for enhanced motion capture.
Findings
Outperforms state-of-the-art methods in global orientation estimation
Achieves superior local pose accuracy in diverse conditions
Demonstrates robustness in extreme input scenarios
Abstract
Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap. To this end, we propose a method that fuses monocular images and sparse IMUs for real-time human motion capture. Our method contains a dual coordinate strategy to fully explore the IMU signals with different goals in motion capture. To be specific, besides one branch transforming the IMU signals to the camera coordinate system to combine with the image information, there is another branch to learn from the IMU signals in the body root coordinate system to better estimate body poses. Furthermore, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
