Distill3R: A Pipeline for Democratizing 3D Foundation Models on Commodity Hardware
Brandon Leblanc, Charalambos Poullis

TL;DR
Distill3R introduces a method to distill large 3D foundation models into compact, trainable models on standard hardware, making advanced 3D reconstruction accessible to smaller labs.
Contribution
The paper presents a novel distillation framework with offline caching and confidence-aware loss, enabling training of 3D models on commodity hardware, reducing size and inference time significantly.
Findings
Student model achieves 9x parameter reduction
Inference speed is 5x faster than teacher
Training completed in under 3 days on a single workstation
Abstract
While multi-view 3D reconstruction has shifted toward large-scale foundation models capable of inferring globally consistent geometry, their reliance on massive computational clusters for training has created a significant barrier to entry for most academic laboratories. To bridge this compute divide, we introduce Distill3R, a framework designed to distill the geometric reasoning of 3D foundation models into compact students fully trainable on a single workstation. Our methodology centers on two primary innovations: (1) an offline caching pipeline that decouples heavy teacher inference from the training loop through compressed supervision signals, and (2) a confidence-aware distillation loss that leverages teacher uncertainty to enable training on commodity hardware. We propose a 72M-parameter student model which achieves a 9x reduction in parameters and a 5x inference speedup compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
