TL;DR
MAMMA is a markerless, multi-view motion capture system that accurately estimates human motion, including complex interactions, using dense surface landmarks and synthetic training data, eliminating the need for markers.
Contribution
Introduces a novel dense landmark prediction method for multi-person motion capture that handles occlusions and interactions, trained on a large synthetic dataset.
Findings
Outperforms existing methods in complex person interactions.
Achieves comparable accuracy to marker-based systems.
Provides new benchmarks for dense-landmark prediction.
Abstract
We present MAMMA, a markerless motion-capture pipeline that accurately recovers SMPL-X parameters from multi-view video of two-person interaction sequences. Traditional motion-capture systems rely on physical markers. Although they offer high accuracy, their requirements of specialized hardware, manual marker placement, and extensive post-processing make them costly and time-consuming. Recent learning-based methods attempt to overcome these limitations, but most are designed for single-person capture, rely on sparse keypoints, or struggle with occlusions and physical interactions. In this work, we introduce a method that predicts dense 2D contact-aware surface landmarks conditioned on segmentation masks, enabling person-specific correspondence estimation even under heavy occlusion. We employ a novel architecture that exploits learnable queries for each landmark. We demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
