Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai, Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li

TL;DR
Matrix3D is a comprehensive photogrammetry model that unifies pose estimation, depth prediction, and view synthesis using a multi-modal diffusion transformer, leveraging mask learning to train on diverse incomplete data for improved 3D content creation.
Contribution
The paper introduces Matrix3D, a novel unified model employing a multi-modal diffusion transformer and mask learning strategy to perform multiple photogrammetry tasks with incomplete data.
Findings
Achieves state-of-the-art results in pose estimation.
Demonstrates superior novel view synthesis performance.
Enables fine-grained control for 3D content creation.
Abstract
We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D's large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry · 3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications
MethodsDiffusion
