Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera
Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin,, Pascal Fua, Hans-Peter Seidel, Christian Theobalt

TL;DR
This paper introduces Mo2Cap2, a real-time egocentric 3D human pose estimation system using a cap-mounted fisheye camera, featuring novel hardware, a large training dataset, and a disentangled CNN approach, achieving high accuracy and robustness.
Contribution
The paper presents the first real-time egocentric 3D pose estimation method with a lightweight cap-mounted fisheye camera and a new disentangled CNN model, along with a large annotated dataset.
Findings
Achieves 60Hz pose estimation on a consumer GPU.
Lower 3D joint error compared to baselines.
Better 2D overlay accuracy than existing methods.
Abstract
We propose the first real-time approach for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera. From the captured egocentric live stream, our CNN based 3D pose estimation approach runs at 60Hz on a consumer-level GPU. In addition to the novel hardware setup, our other main contributions are: 1) a large ground truth training corpus of top-down fisheye images and 2) a novel disentangled 3D pose estimation approach that takes the unique properties of the egocentric viewpoint into account.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
