LiteEmbed: Adapting CLIP to Rare Classes

Aishwarya Agarwal; Srikrishna Karanam; Vineet Gandhi

arXiv:2601.09661·cs.CV·January 15, 2026

LiteEmbed: Adapting CLIP to Rare Classes

Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi

PDF

Open Access

TL;DR

LiteEmbed is a lightweight method that adapts CLIP to recognize rare or unseen classes by optimizing text embeddings without retraining the entire model, improving performance across various vision tasks.

Contribution

It introduces a PCA-based subspace-guided optimization for CLIP's text embeddings, enabling effective few-shot personalization without retraining encoders.

Findings

01

Significant performance improvements over prior methods.

02

Effective across classification, retrieval, segmentation, and detection tasks.

03

Seamless plug-and-play integration with CLIP.

Abstract

Large-scale vision-language models such as CLIP achieve strong zero-shot recognition but struggle with classes that are rarely seen during pretraining, including newly emerging entities and culturally specific categories. We introduce LiteEmbed, a lightweight framework for few-shot personalization of CLIP that enables new classes to be added without retraining its encoders. LiteEmbed performs subspace-guided optimization of text embeddings within CLIP's vocabulary, leveraging a PCA-based decomposition that disentangles coarse semantic directions from fine-grained variations. Two complementary objectives, coarse alignment and fine separation, jointly preserve global semantic consistency while enhancing discriminability among visually similar classes. Once optimized, the embeddings are plug-and-play, seamlessly substituting CLIP's original text features across classification, retrieval,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling