3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
Seonho Lee, Jiho Choi, Inha Kang, Jiwook Kim, Junsung Park, Hyunjung Shim

TL;DR
This paper introduces Geometric Distillation, a lightweight fine-tuning method that enhances pretrained vision-language models with 3D spatial understanding by distilling geometric cues from existing 3D models, without altering the original architecture.
Contribution
It presents a novel, annotation-free framework for injecting 3D geometric knowledge into VLMs, improving 3D reasoning capabilities efficiently and compatibly with existing models.
Findings
Outperforms prior methods on 3D vision-language benchmarks.
Achieves better 3D spatial reasoning with lower computational cost.
Effectively integrates geometric cues without architecture modifications.
Abstract
Vision-Language Models (VLMs) have shown remarkable performance on diverse visual and linguistic tasks, yet they remain fundamentally limited in their understanding of 3D spatial structures. We propose Geometric Distillation, a lightweight, annotation-free fine-tuning framework that injects human-inspired geometric cues into pretrained VLMs without modifying their architecture. By distilling (1) sparse correspondences, (2) relative depth relations, and (3) dense cost volumes from off-the-shelf 3D foundation models (e.g., MASt3R, VGGT), our method shapes representations to be geometry-aware while remaining compatible with natural image-text inputs. Through extensive evaluations on 3D vision-language reasoning and 3D perception benchmarks, our method consistently outperforms prior approaches, achieving improved 3D spatial reasoning with significantly lower computational cost. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Constraint Satisfaction and Optimization
