HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks

Xiaoyu Li; Yuhang Liu; Xuanshuo Kang; Zheng Luo; Fangqi Lou; Xiaohua Wu; Zihan Xiong

arXiv:2603.12760·cs.CV·March 30, 2026

HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks

Xiaoyu Li, Yuhang Liu, Xuanshuo Kang, Zheng Luo, Fangqi Lou, Xiaohua Wu, Zihan Xiong

PDF

1 Repo

TL;DR

HIFICL introduces a novel method for more accurately modeling in-context learning in multimodal models, leading to improved performance on benchmarks by using virtual key-value pairs and low-rank factorization.

Contribution

It proposes a new approach, HIFICL, that better captures the ICL mechanism in multimodal models through learnable context and efficient training techniques.

Findings

01

HIFICL outperforms existing approximation methods on multimodal benchmarks.

02

The method effectively models the influence of demonstrations in ICL.

03

Code is publicly available at the provided GitHub link.

Abstract

In-Context Learning (ICL) is a significant paradigm for Large Multimodal Models (LMMs), using a few in-context demonstrations (ICDs) for new task adaptation. However, its performance is sensitive to demonstration configurations and computationally expensive. Mathematically, the influence of these demonstrations can be decomposed into a dynamic mixture of the standard attention output and the context values. Current approximation methods simplify this process by learning a "shift vector". Inspired by the exact decomposition, we introduce High-Fidelity In-Context Learning (HIFICL) to more faithfully model the ICL mechanism. HIFICL consists of three key components: 1) a set of "virtual key-value pairs" to act as a learnable context, 2) a low-rank factorization for stable and regularized training, and 3) a simple end-to-end training objective. From another perspective, this mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bbbandari/HiFICL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.