HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction
Sara Rojas, Matthieu Armando, Bernard Ghamen, Philippe Weinzaepfel, Vincent Leroy, Gregory Rogez

TL;DR
HAMSt3R is a novel multi-view stereo method that jointly reconstructs humans and scenes from sparse, uncalibrated images, improving accuracy and efficiency in human-centric 3D reconstruction tasks.
Contribution
It introduces a multi-head network leveraging a distilled encoder for joint human and scene 3D reconstruction, bridging the gap between human and scene understanding.
Findings
Effective reconstruction of humans in diverse scenarios
Strong generalization to traditional multi-view stereo tasks
Efficient, fully feed-forward architecture
Abstract
Recovering the 3D geometry of a scene from a sparse set of uncalibrated images is a long-standing problem in computer vision. While recent learning-based approaches such as DUSt3R and MASt3R have demonstrated impressive results by directly predicting dense scene geometry, they are primarily trained on outdoor scenes with static environments and struggle to handle human-centric scenarios. In this work, we introduce HAMSt3R, an extension of MASt3R for joint human and scene 3D reconstruction from sparse, uncalibrated multi-view images. First, we exploit DUNE, a strong image encoder obtained by distilling, among others, the encoders from MASt3R and from a state-of-the-art Human Mesh Recovery (HMR) model, multi-HMR, for a better understanding of scene geometry and human bodies. Our method then incorporates additional network heads to segment people, estimate dense correspondences via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
