Distilling Large Vision-Language Model with Out-of-Distribution   Generalizability

Xuanlin Li; Yunhao Fang; Minghua Liu; Zhan Ling; Zhuowen Tu; Hao Su

arXiv:2307.03135·cs.CV·October 13, 2023

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for distilling large vision-language models into smaller models that excel in out-of-distribution generalization, especially in open-vocabulary tasks, by enhancing visual and semantic representations.

Contribution

It introduces two principles for improving OOD generalization in distilled models, focusing on visual representation imitation and semantic attribute enrichment.

Findings

01

Significant improvements in zero-shot OOD classification

02

Enhanced few-shot performance on open-vocabulary tasks

03

Effective distillation techniques for out-of-distribution generalization

Abstract

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuanlinli17/large_vlm_distillation_ood
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques