Monocular Models are Strong Learners for Multi-View Human Mesh Recovery
Haoyu Xie, Shengkai Xu, Cheng Guo, Muhammad Usama Saleem, Wenhan Wu, Chen Chen, Ahmed Helmy, Pu Wang, Hongfei Xue

TL;DR
This paper introduces a training-free, multi-view human mesh recovery framework that leverages single-view models and test-time optimization to achieve state-of-the-art results without multi-view training data.
Contribution
It proposes a novel calibration-free approach that uses pretrained single-view models and test-time optimization for multi-view human mesh recovery.
Findings
Achieves state-of-the-art performance on standard benchmarks.
Outperforms models trained with explicit multi-view supervision.
Eliminates the need for multi-view training data.
Abstract
Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
