MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

Tao Chen; Enwei Zhang; Yuting Gao; Ke Li; Xing Sun; Yan Zhang; Hui Li; and Rongrong Ji

arXiv:2312.06363·cs.AI·August 13, 2024·1 cites

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

Tao Chen, Enwei Zhang, Yuting Gao, Ke Li, Xing Sun, Yan Zhang, Hui Li, and Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces MMICT, a multi-modal fine-tuning approach that leverages in-context learning capabilities of multi-modal LLMs to significantly improve performance on various downstream tasks by using a unified multi-modal feature module.

Contribution

The paper proposes MMICT and the Multi-Modal Hub (M-Hub), a novel framework that enhances multi-modal fine-tuning by integrating in-context visual-guided textual features for improved task performance.

Findings

01

MMICT outperforms traditional fine-tuning methods.

02

MMICT surpasses vanilla in-context tuning with concatenated inputs.

03

Extensive experiments validate the effectiveness of MMICT.

Abstract

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm that boosts multi-modal fine-tuning by fully leveraging the promising ICL capability of multi-modal LLMs (MM-LLMs). We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives. Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features. Moreover, leveraging the flexibility of M-Hub, we design a variety of in-context demonstrations. Extensive experiments on a diverse range of downstream multi-modal tasks demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kdegroup/mmict
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning