Training a Student Expert via Semi-Supervised Foundation Model Distillation

Pardis Taghavi; Tian Liu; Renjie Li; Reza Langari; and Zhengzhong Tu

arXiv:2604.03841·cs.CV·April 7, 2026

Training a Student Expert via Semi-Supervised Foundation Model Distillation

Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, and Zhengzhong Tu

PDF

TL;DR

This paper presents a semi-supervised knowledge distillation framework that compresses large vision foundation models into smaller, efficient experts for instance segmentation, leveraging limited labeled data and extensive unlabeled data.

Contribution

The authors introduce a novel three-stage semi-supervised distillation method with an instance-aware contrastive loss for effective model compression and improved segmentation performance.

Findings

01

The student model achieves +11.9 AP over zero-shot teacher on Cityscapes.

02

The approach surpasses adapted teachers by +3.4 AP.

03

State-of-the-art results on benchmark datasets.

Abstract

Foundation models deliver strong perception but are often too computationally heavy to deploy, and adapting them typically requires costly annotations. We introduce a semi-supervised knowledge distillation (SSKD) framework that compresses pre-trained vision foundation models (VFMs) into compact experts using limited labeled and abundant unlabeled data, and instantiate it for instance segmentation where per-pixel labels are particularly expensive. The framework unfolds in three stages: (1) domain adaptation of the VFM(s) via self-training with contrastive calibration, (2) knowledge transfer through a unified multi-objective loss, and (3) student refinement to mitigate residual pseudo-label bias. Central to our approach is an instance-aware pixel-wise contrastive loss that fuses mask and class scores to extract informative negatives and enforce clear inter-instance margins. By maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.