Online Zero-Shot Classification with CLIP
Qi Qian, Juhua Hu

TL;DR
This paper introduces OnZeta, an online zero-shot classification framework leveraging CLIP, which dynamically adapts to data distribution during inference, achieving high accuracy without storing data, suitable for real-time applications.
Contribution
The paper proposes a novel online zero-shot transfer method that models target data distribution and optimizes class proxies in real-time, with theoretical convergence guarantees.
Findings
Achieves 78.94% accuracy on ImageNet without full dataset access.
Improves performance by over 3% on average across 13 downstream tasks.
Demonstrates effective online adaptation for zero-shot classification.
Abstract
Vision-language pre-training such as CLIP enables zero-shot transfer that can classify images according to the candidate class names. While CLIP demonstrates an impressive zero-shot performance on diverse downstream tasks, the distribution from the target data has not been leveraged sufficiently. In this work, we study a novel online zero-shot transfer scenario, where each image arrives in a random order for classification and is visited only once to obtain prediction immediately without storing its representation. Compared with the vanilla zero-shot classification, the proposed framework preserves its flexibility for online service while considering the statistics of the arrived images as the side information to capture the distribution of target data, which can help improve the performance of real-world applications. To tackle the challenge of effective online optimization, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
Methodstravel james · Contrastive Language-Image Pre-training
