Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL

Wenwen Liao; Jianbo Yu; Yuansong Wang; Shifu Yan; Xiaofeng Yang

arXiv:2601.10117·cs.CV·January 16, 2026

Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL

Wenwen Liao, Jianbo Yu, Yuansong Wang, Shifu Yan, Xiaofeng Yang

PDF

Open Access

TL;DR

This paper introduces an end-to-end VICL framework that fuses multiple prompts and exploits their arrangements to improve visual inpainting tasks, addressing limitations of previous single-prompt methods.

Contribution

It proposes a novel adaptive fusion module and arrangement-specific lightweight MLPs, along with bidirectional fine-tuning, to enhance prompt utilization and model adaptability in VICL.

Findings

01

Outperforms existing methods on foreground segmentation, detection, and colorization.

02

Demonstrates strong cross-task generalization.

03

Achieves superior results with minimal additional model complexity.

Abstract

Vision In-Context Learning (VICL) enables inpainting models to quickly adapt to new visual tasks from only a few prompts. However, existing methods suffer from two key issues: (1) selecting only the most similar prompt discards complementary cues from other high-quality prompts; and (2) failing to exploit the structured information implied by different prompt arrangements. We propose an end-to-end VICL framework to overcome these limitations. Firstly, an adaptive Fusion Module aggregates critical patterns and annotations from multiple prompts to form more precise contextual prompts. Secondly, we introduce arrangement-specific lightweight MLPs to decouple layout priors from the core model, while minimally affecting the overall model. In addition, an bidirectional fine-tuning mechanism swaps the roles of query and prompt, encouraging the model to reconstruct the original prompt from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications