Low-Rank Few-Shot Adaptation of Vision-Language Models

Maxime Zanella; Ismail Ben Ayed

arXiv:2405.18541·cs.CV·June 4, 2024·1 cites

Low-Rank Few-Shot Adaptation of Vision-Language Models

Maxime Zanella, Ismail Ben Ayed

PDF

Open Access 2 Repos

TL;DR

This paper introduces Low-Rank Adaptation (LoRA) for few-shot learning in Vision-Language Models, demonstrating significant improvements over existing prompt and adapter methods across multiple datasets with reduced training complexity.

Contribution

The paper presents a novel Low-Rank Adaptation (LoRA) approach for few-shot VLMs, offering a simple, effective, and hyper-parameter-agnostic alternative to prompt and adapter-based methods.

Findings

01

CLIP-LoRA outperforms state-of-the-art methods on 11 datasets.

02

Training times are significantly reduced with LoRA.

03

Hyper-parameters remain consistent across tasks.

Abstract

Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, task-specific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-of-the-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications