Improving CLIP Robustness with Knowledge Distillation and Self-Training

Clement Laroudie; Andrei Bursuc; Mai Lan Ha; Gianni Franchi

arXiv:2309.10361·cs.CV·September 20, 2023·2 cites

Improving CLIP Robustness with Knowledge Distillation and Self-Training

Clement Laroudie, Andrei Bursuc, Mai Lan Ha, Gianni Franchi

PDF

Open Access

TL;DR

This paper introduces LP-CLIP, a novel method that enhances CLIP's robustness using knowledge distillation and self-training with pseudo-labels, without requiring annotated data, achieving state-of-the-art results.

Contribution

The paper proposes LP-CLIP, a new approach that improves CLIP's robustness through a linear probing layer trained with pseudo-labels, eliminating the need for annotated data.

Findings

01

LP-CLIP outperforms supervised methods on multiple datasets.

02

The approach enhances robustness without relying on labeled data.

03

State-of-the-art results demonstrate effectiveness across various scenarios.

Abstract

This paper examines the robustness of a multi-modal computer vision model, CLIP (Contrastive Language-Image Pretraining), in the context of unsupervised learning. The main objective is twofold: first, to evaluate the robustness of CLIP, and second, to explore strategies for augmenting its robustness. To achieve this, we introduce a novel approach named LP-CLIP. This technique involves the distillation of CLIP features through the incorporation of a linear probing layer positioned atop its encoding structure. This newly added layer is trained utilizing pseudo-labels produced by CLIP, coupled with a self-training strategy. The LP-CLIP technique offers a promising approach to enhance the robustness of CLIP without the need for annotations. By leveraging a simple linear probing layer, we aim to improve the model's ability to withstand various uncertainties and challenges commonly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training