Multi-View 3D Reconstruction using Knowledge Distillation

Aditya Dutt; Ishikaa Lunawat; Manpreet Kaur

arXiv:2412.02039·cs.CV·February 20, 2026

Multi-View 3D Reconstruction using Knowledge Distillation

Aditya Dutt, Ishikaa Lunawat, Manpreet Kaur

PDF

Open Access 1 Repo

TL;DR

This paper introduces a knowledge distillation approach to develop efficient 3D reconstruction models, using Dust3r as a teacher, with Vision Transformers showing superior performance over CNNs.

Contribution

It proposes a novel student-teacher framework for 3D reconstruction, exploring different architectures and training strategies to replicate Dust3r's outputs efficiently.

Findings

01

Vision Transformer outperforms CNN in 3D reconstruction quality

02

Pre-trained models improve performance over models trained from scratch

03

Ablation studies identify key hyperparameters for optimal results

Abstract

Large Foundation Models like Dust3r can produce high quality outputs such as pointmaps, camera intrinsics, and depth estimation, given stereo-image pairs as input. However, the application of these outputs on tasks like Visual Localization requires a large amount of inference time and compute resources. To address these limitations, in this paper, we propose the use of a knowledge distillation pipeline, where we aim to build a student-teacher model with Dust3r as the teacher and explore multiple architectures of student models that are trained using the 3D reconstructed points output by Dust3r. Our goal is to build student models that can learn scene-specific representations and output 3D points with replicable performance such as Dust3r. The data set we used to train our models is 12Scenes. We test two main architectures of models: a CNN-based architecture and a Vision Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ishikaalunawat/231aproj
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage

MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Softmax · Label Smoothing · Dropout · Sparse Evolutionary Training · Dense Connections · Layer Normalization · Knowledge Distillation