FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou, Peng Zhou, Xiaoyong Pan

TL;DR
FLIER leverages latent representations from diffusion models combined with CLIP to enhance few-shot image classification, achieving state-of-the-art results across multiple datasets.
Contribution
Introduces a novel latent encoder trained with CLIP, integrating diffusion model representations for improved few-shot learning performance.
Findings
State-of-the-art results on 11 datasets for few-shot classification.
Effective transfer of pre-trained knowledge through joint training.
Simpler latent encoder architecture compared to existing models.
Abstract
As the boosting development of large vision-language models like Contrastive Language-Image Pre-training (CLIP), many CLIP-like methods have shown impressive abilities on visual recognition, especially in low-data regimes scenes. However, we have noticed that most of these methods are limited to introducing new modifications on text and image encoder. Recently, latent diffusion models (LDMs) have shown good ability on image generation. The potent capabilities of LDMs direct our focus towards the latent representations sampled by UNet. Inspired by the conjecture in CoOp that learned prompts encode meanings beyond the existing vocabulary, we assume that, for deep models, the latent representations are concise and accurate understanding of images, in which high-frequency, imperceptible details are abstracted away. In this paper, we propose a Few-shot Language Image model Embedded with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Residual Connection · Attention Is All You Need · Linear Layer · Weight Decay · Cosine Annealing · Dropout · Byte Pair Encoding · Softmax
