UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy

Yicheng Xu; Jiangning Zhang; Zhucun Xue; Teng Hu; Ran Yi; Xiaobin Hu; Yong Liu; Dacheng Tao

arXiv:2603.24690·cs.CV·March 27, 2026

UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy

Yicheng Xu, Jiangning Zhang, Zhucun Xue, Teng Hu, Ran Yi, Xiaobin Hu, Yong Liu, Dacheng Tao

PDF

Open Access

TL;DR

This paper introduces a taxonomy and benchmark for understanding and improving multimodal in-context learning, proposing a new module that stabilizes adaptation and outperforms larger models on key tasks.

Contribution

It presents a capability-oriented taxonomy for analyzing multimodal in-context learning and introduces UniICL-760K and UniICL-Bench for systematic evaluation, along with a novel stabilizing module.

Findings

01

The taxonomy clarifies the functional roles of demonstrations in multimodal tasks.

02

The proposed module improves stability and performance in few-shot learning.

03

Our approach outperforms larger models on most understanding tasks.

Abstract

In-context Learning enables training-free adaptation via demonstrations but remains highly sensitive to example selection and formatting. In unified multimodal models spanning understanding and generation, this sensitivity is exacerbated by cross-modal interference and varying cognitive demands. Consequently, In-context Learning efficacy is often non-monotonic and highly task-dependent. To diagnose these behaviors, we introduce a six-level capability-oriented taxonomy that categorizes the functional role of demonstrations from basic perception to high-order discernment. Guided by this cognitive framework, we construct UniICL-760K, a large-scale corpus featuring curated 8-shot In-context Learning episodes across 15 subtasks, alongside UniICL-Bench for rigorous, controlled evaluation. As an architectural intervention to stabilize few-shot adaptation, we propose the Context-Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling