FewMMBench: A Benchmark for Multimodal Few-Shot Learning
Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

TL;DR
FewMMBench is a new benchmark designed to evaluate the few-shot learning abilities of multimodal large language models across diverse tasks, revealing insights into their performance with different prompting strategies and model types.
Contribution
Introduces FewMMBench, a comprehensive benchmark for assessing multimodal LLMs in few-shot scenarios, including diverse tasks and prompting methods, with extensive evaluation of 26 models.
Findings
Instruction-tuned models perform well zero-shot but show limited or negative gains with few-shot or CoT prompts.
Retrieval-based demonstrations and larger context sizes provide minimal improvements.
FewMMBench serves as a rigorous tool for diagnosing and improving multimodal few-shot learning.
Abstract
As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce FewMMBench, a comprehensive benchmark designed to evaluate MLLMs under few-shot conditions, with a focus on In-Context Learning (ICL) and Chain-of-Thought (CoT) prompting. Covering a diverse suite of multimodal understanding tasks, from attribute recognition to temporal reasoning, FewMMBench enables systematic analysis across task types, model families, and prompting strategies. We evaluate 26 open-weight MLLMs from six model families across zero-shot, few-shot, and CoT-augmented few-shot settings. Our findings reveal that instruction-tuned models exhibit strong zero-shot performance but benefit minimally, or even regress, with additional demonstrations or CoT reasoning. Retrieval-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
