Parallel In-context Learning for Large Vision Language Models

Shin'ya Yamaguchi; Daiki Chijiwa; Tamao Sakao; Taku Hasegawa

arXiv:2603.16092·cs.CV·March 18, 2026

Parallel In-context Learning for Large Vision Language Models

Shin'ya Yamaguchi, Daiki Chijiwa, Tamao Sakao, Taku Hasegawa

PDF

Open Access

TL;DR

This paper introduces Parallel-ICL, a method that enables large vision-language models to process long demonstration contexts efficiently by parallelizing inference and combining predictions, maintaining accuracy while reducing latency.

Contribution

The paper proposes a novel parallel inference algorithm for multi-modal in-context learning that significantly reduces inference time without sacrificing performance.

Findings

01

Parallel-ICL achieves comparable accuracy to full-context ICL.

02

It significantly reduces inference latency.

03

The method is effective across multiple vision-language tasks.

Abstract

Large vision-language models (LVLMs) employ multi-modal in-context learning (MM-ICL) to adapt to new tasks by leveraging demonstration examples. While increasing the number of demonstrations boosts performance, they incur significant inference latency due to the quadratic computational cost of Transformer attention with respect to the context length. To address this trade-off, we propose Parallel In-Context Learning (Parallel-ICL), a plug-and-play inference algorithm. Parallel-ICL partitions the long demonstration context into multiple shorter, manageable chunks. It processes these chunks in parallel and integrates their predictions at the logit level, using a weighted Product-of-Experts (PoE) ensemble to approximate the full-context output. Guided by ensemble learning theory, we introduce principled strategies for Parallel-ICL: (i) clustering-based context chunking to maximize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques