TL;DR
Lite3R is a model-agnostic framework that enhances the efficiency of transformer-based 3D reconstruction by reducing attention overhead and enabling stable low-precision deployment, with significant latency and memory improvements.
Contribution
It introduces Sparse Linear Attention and FP8-aware quantization-aware training for efficient, stable 3D reconstruction without retraining entire models.
Findings
Reduces latency by 1.7-2.0x and memory usage by 1.9-2.4x
Maintains competitive reconstruction quality
Enables low-precision deployment with pretrained priors
Abstract
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
