MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin, Tong, Jiaolong Yang

TL;DR
MoGe is a novel model that accurately estimates 3D geometry from monocular images using affine-invariant representations and innovative supervision techniques, achieving superior results across multiple tasks.
Contribution
The paper introduces a new affine-invariant 3D point map representation and novel supervision methods that enhance monocular geometry estimation accuracy and generalizability.
Findings
Outperforms state-of-the-art methods on diverse datasets
Achieves high accuracy in 3D point map, depth, and FOV estimation
Demonstrates strong generalization across open-domain images
Abstract
We present MoGe, a powerful model for recovering 3D geometry from monocular open-domain images. Given a single image, our model directly predicts a 3D point map of the captured scene with an affine-invariant representation, which is agnostic to true global scale and shift. This new representation precludes ambiguous supervision in training and facilitate effective geometry learning. Furthermore, we propose a set of novel global and local geometry supervisions that empower the model to learn high-quality geometry. These include a robust, optimal, and efficient point cloud alignment solver for accurate global shape learning, and a multi-scale local geometry loss promoting precise local geometry supervision. We train our model on a large, mixed dataset and demonstrate its strong generalizability and high accuracy. In our comprehensive evaluation on diverse unseen datasets, our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Medical Image Segmentation Techniques
MethodsSparse Evolutionary Training
