Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization
Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan

TL;DR
This paper introduces Universal Entropy Optimization (UEO), a simple and efficient method for realistic unsupervised fine-tuning of CLIP that improves out-of-distribution detection and known class recognition without relying on class labels.
Contribution
The paper proposes UEO, a novel approach that optimizes sample confidence and channel-wise affine transformations for unsupervised CLIP fine-tuning in realistic scenarios.
Findings
UEO outperforms baseline methods across 15 domains.
It effectively enhances out-of-distribution detection.
UEO improves recognition of known classes without supervision.
Abstract
The emergence of vision-language models, such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes within the unlabeled data. In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes. To tackle this problem, we present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsFocus · Contrastive Language-Image Pre-training
