What Factors Affect Multi-Modal In-Context Learning? An In-Depth   Exploration

Libo Qin,Qiguang Chen,Hao Fei,Zhi Chen,Min Li,Wanxiang Che

arXiv:2410.20482·cs.CL·October 29, 2024

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Libo Qin,Qiguang Chen,Hao Fei,Zhi Chen,Min Li,Wanxiang Che

PDF

Open Access 1 Video

TL;DR

This paper investigates the factors influencing Multi-Modal In-Context Learning (MM-ICL) performance, analyzing demonstration retrieval, ordering, and prompt construction across various models and strategies to optimize effectiveness.

Contribution

It provides an in-depth experimental analysis identifying key factors affecting MM-ICL, including retrieval methods, demonstration ordering, and prompt design, offering practical guidance for future improvements.

Findings

01

Multi-modal retrievers are essential for demonstration retrieval.

02

Intra-demonstration ordering is more impactful than inter-demonstration ordering.

03

Including introductory instructions in prompts improves task understanding.

Abstract

Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: "What factors affect the performance of MM-ICL?'' To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration· slideslive

Taxonomy

TopicsSpeech and dialogue systems