Enhancing Visual In-Context Learning by Multi-Faceted Fusion

Wenwen Liao; Jianbo Yu; Yuansong Wang; Qingchao Jiang; Xiaofeng Yang

arXiv:2601.10107·cs.CV·January 16, 2026

Enhancing Visual In-Context Learning by Multi-Faceted Fusion

Wenwen Liao, Jianbo Yu, Yuansong Wang, Qingchao Jiang, Xiaofeng Yang

PDF

Open Access

TL;DR

This paper introduces a multi-faceted fusion framework for Visual In-Context Learning that leverages collaborative integration of multiple prompts, significantly improving model performance across various visual tasks.

Contribution

It proposes a novel multi-combination collaborative fusion approach and the MULTI-VQGAN architecture to better utilize diverse contextual prompts in VICL.

Findings

01

Enhanced performance on foreground segmentation, object detection, and image colorization.

02

Superior cross-task generalization and robustness.

03

More accurate predictions compared to existing prompt fusion methods.

Abstract

Visual In-Context Learning (VICL) has emerged as a powerful paradigm, enabling models to perform novel visual tasks by learning from in-context examples. The dominant "retrieve-then-prompt" approach typically relies on selecting the single best visual prompt, a practice that often discards valuable contextual information from other suitable candidates. While recent work has explored fusing the top-K prompts into a single, enhanced representation, this still simply collapses multiple rich signals into one, limiting the model's reasoning capability. We argue that a more multi-faceted, collaborative fusion is required to unlock the full potential of these diverse contexts. To address this limitation, we introduce a novel framework that moves beyond single-prompt fusion towards an multi-combination collaborative fusion. Instead of collapsing multiple prompts into one, our method generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications