DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

Numan Saeed; Asif Hanif; Fadillah Adamsyah Maani; Hussain Alasmawi; Mohammad Yaqub

arXiv:2603.05421·cs.CV·May 8, 2026

DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

Numan Saeed, Asif Hanif, Fadillah Adamsyah Maani, Hussain Alasmawi, Mohammad Yaqub

PDF

1 Models

TL;DR

This paper introduces DARK, a contrastive knowledge distillation method that improves extreme model compression for vision-language tasks by encouraging structured decorrelation and repulsion of non-target similarities.

Contribution

DARK decomposes distillation loss into diagonal and off-diagonal terms, transitioning from imitation to repulsion, enabling effective compression of large models into smaller, efficient ones.

Findings

01

Student matches or exceeds teacher on zero-shot benchmarks.

02

DARK induces structured decorrelation, reducing inter-class confusion.

03

Efficiently compresses a 427M-parameter model into a 75M-parameter model.

Abstract

Compressing vision-language models for on-device deployment is increasingly important in clinical settings, but knowledge distillation (KD) degrades sharply when the teacher-student capacity gap spans an order of magnitude or more. We argue that, under such gaps, strict imitation of the teacher is a poor objective: much of the teacher's pairwise similarity structure reflects its own architectural biases rather than information a compact student can efficiently represent. We propose \textbf{Diagonal-Anchored Repulsive Knowledge Distillation (DARK)}, a contrastive KD framework that decomposes the distillation loss into a diagonal term (matched image-text pairs) and an off-diagonal term (non-target similarities). The diagonal term anchors matched-pair alignment throughout training; the off-diagonal term is annealed from positive to negative weighting, transitioning the student from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
numansaeed/MobileFetalCLIP
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.