Motion Capture from Inertial and Vision Sensors

Xiaodong Chen; Wu Liu; Qian Bao; Xinchen Liu; Ruoli Dai; Yongdong Zhang; Tao Mei

arXiv:2407.16341·cs.CV·April 6, 2026

Motion Capture from Inertial and Vision Sensors

Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Ruoli Dai, Yongdong Zhang, Tao Mei

PDF

TL;DR

This paper introduces MINIONS, a large-scale multi-modal motion capture dataset combining inertial sensors and RGB videos, and proposes SparseNet for accurate human motion capture using minimal sensors and a monocular camera.

Contribution

The paper presents a new extensive dataset and a novel SparseNet framework for multi-modal human motion capture with minimal sensors and monocular vision.

Findings

01

MINIONS dataset contains over five million frames and 146 actions.

02

SparseNet effectively combines inertial and vision data for motion capture.

03

Results demonstrate the potential for consumer-friendly motion capture solutions.

Abstract

Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.