TL;DR
MOCHI is a novel multi-view 3D face reconstruction framework trained without registered data, using topological consistency and dense keypoints, achieving high accuracy with a test-time optimization scheme.
Contribution
It introduces a registration-free training method for multi-view face reconstruction that combines topological constraints, synthetic keypoints, and a test-time optimization for improved results.
Findings
MOCHI outperforms traditional registration-based pipelines in accuracy.
Pointmap- and normal-based losses improve training stability and reconstruction quality.
Test-time optimization refines network weights for better visual fidelity.
Abstract
Recent frameworks like ToFu and TEMPEH provide an automated alternative to classical registration pipelines by predicting 3D meshes in dense semantic correspondence directly from calibrated multi-view images. However, these learning-based methods rely on the slow, manual registration pipelines they aim to replace for their training supervision. We overcome this limitation with MOCHI (Multi-view Optimizable Correspondence of Heads from Images), a multi-view 3D face prediction framework trained without requiring registered training data. MOCHI eliminates the registration data dependency by enforcing topological consistency through a pseudo-linear inverse kinematic solver. Semantic alignment is guided by dense keypoints from a 2D landmark predictor trained exclusively on synthetic data. Our analysis further reveals that standard point-to-surface distances induce training instabilities and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
