Breaking the Geometric Bottleneck: Contrastive Expansion in Asymmetric Cross-Modal Distillation
Kabir Thayani

TL;DR
This paper addresses the geometric limitations in cross-modal knowledge distillation, proposing contrastive expansion to improve representation capacity and robustness in CNNs derived from vision transformers.
Contribution
It introduces a contrastive expansion method using InfoNCE to overcome geometric collapse in asymmetric distillation, enhancing CNN capacity and noise immunity.
Findings
Contrastive expansion increases effective dimensions by 2.4x
Collapse occurs at an Effective Rank of ~16 in standard distillation
Contrastive methods recover CNN capacity up to ~82 dimensions
Abstract
Knowledge distillation between asymmetric architectures often induces severe geometric constraints on the learned representation space. In this work, we investigate the Dimensional Collapse phenomenon when distilling global Vision Transformers (CLIP and DINOv2) into capacity-constrained CNNs. By employing strictly centered SVD and Effective Rank, we first demonstrate a capacity-agnostic phase transition on CIFAR-10 where standard cosine distillation collapses representations to an intrinsic Effective Rank of ~16. To reverse this, we integrate an auxiliary contrastive objective (InfoNCE), expanding the student's manifold by 2.4x (to ~38 effective dimensions). We further demonstrate that while DINOv2's uniform geometry partially prevents collapse, contrastive expansion remains a universal requirement to reach the CNN's topological capacity limit (~82 dimensions). Finally, we reveal a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Advanced Neural Network Applications
