HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas; Matthieu Armando; Bernard Ghamen; Philippe Weinzaepfel; Vincent Leroy; Gregory Rogez

arXiv:2508.16433·cs.CV·August 25, 2025

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas, Matthieu Armando, Bernard Ghamen, Philippe Weinzaepfel, Vincent Leroy, Gregory Rogez

PDF

Open Access

TL;DR

HAMSt3R is a novel multi-view stereo method that jointly reconstructs humans and scenes from sparse, uncalibrated images, improving accuracy and efficiency in human-centric 3D reconstruction tasks.

Contribution

It introduces a multi-head network leveraging a distilled encoder for joint human and scene 3D reconstruction, bridging the gap between human and scene understanding.

Findings

01

Effective reconstruction of humans in diverse scenarios

02

Strong generalization to traditional multi-view stereo tasks

03

Efficient, fully feed-forward architecture

Abstract

Recovering the 3D geometry of a scene from a sparse set of uncalibrated images is a long-standing problem in computer vision. While recent learning-based approaches such as DUSt3R and MASt3R have demonstrated impressive results by directly predicting dense scene geometry, they are primarily trained on outdoor scenes with static environments and struggle to handle human-centric scenarios. In this work, we introduce HAMSt3R, an extension of MASt3R for joint human and scene 3D reconstruction from sparse, uncalibrated multi-view images. First, we exploit DUNE, a strong image encoder obtained by distilling, among others, the encoders from MASt3R and from a state-of-the-art Human Mesh Recovery (HMR) model, multi-HMR, for a better understanding of scene geometry and human bodies. Our method then incorporates additional network heads to segment people, estimate dense correspondences via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging