Multi-View 3D Reconstruction using Knowledge Distillation
Aditya Dutt, Ishikaa Lunawat, Manpreet Kaur

TL;DR
This paper introduces a knowledge distillation approach to develop efficient 3D reconstruction models, using Dust3r as a teacher, with Vision Transformers showing superior performance over CNNs.
Contribution
It proposes a novel student-teacher framework for 3D reconstruction, exploring different architectures and training strategies to replicate Dust3r's outputs efficiently.
Findings
Vision Transformer outperforms CNN in 3D reconstruction quality
Pre-trained models improve performance over models trained from scratch
Ablation studies identify key hyperparameters for optimal results
Abstract
Large Foundation Models like Dust3r can produce high quality outputs such as pointmaps, camera intrinsics, and depth estimation, given stereo-image pairs as input. However, the application of these outputs on tasks like Visual Localization requires a large amount of inference time and compute resources. To address these limitations, in this paper, we propose the use of a knowledge distillation pipeline, where we aim to build a student-teacher model with Dust3r as the teacher and explore multiple architectures of student models that are trained using the 3D reconstructed points output by Dust3r. Our goal is to build student models that can learn scene-specific representations and output 3D points with replicable performance such as Dust3r. The data set we used to train our models is 12Scenes. We test two main architectures of models: a CNN-based architecture and a Vision Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Softmax · Label Smoothing · Dropout · Sparse Evolutionary Training · Dense Connections · Layer Normalization · Knowledge Distillation
