Making Large Vision Language Models to be Good Few-shot Learners

Fan Liu; Wenwen Cai; Jian Huo; Chuanyi Zhang; Delong Chen; Jun Zhou

arXiv:2408.11297·cs.CV·August 22, 2024

Making Large Vision Language Models to be Good Few-shot Learners

Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

PDF

Open Access

TL;DR

This paper improves large vision language models' few-shot learning by employing meta-learning, label augmentation, and candidate selection, leading to better performance on various datasets.

Contribution

It introduces a meta-learning based instruction fine-tuning method with label augmentation and candidate selection to enhance LVLMs' few-shot classification capabilities.

Findings

01

Achieves superior few-shot classification performance on multiple datasets.

02

Label augmentation via character perturbation improves model focus on support data.

03

Candidate selection with attribute descriptions benefits training-free LVLMs.

Abstract

Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk learning specific response formats rather than effectively extracting useful information from support data in FSC tasks. In this paper, we investigate LVLMs' performance in FSC and identify key issues such as insufficient learning and the presence of severe positional biases. To tackle the above challenges, we adopt the meta-learning strategy to teach models "learn to learn". By constructing a rich set of meta-tasks for instruction fine-tuning, LVLMs enhance the ability to extract information from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsSparse Evolutionary Training