EPIC: Efficient Prompt Interaction for Text-Image Classification

Xinyao Yu; Hao Sun; Zeyu Ling; Ziwei Niu; Zhenjia Bai; Rui Qin; Yen-Wei Chen; Lanfen Lin

arXiv:2507.07415·cs.CV·July 11, 2025

EPIC: Efficient Prompt Interaction for Text-Image Classification

Xinyao Yu, Hao Sun, Zeyu Ling, Ziwei Niu, Zhenjia Bai, Rui Qin, Yen-Wei Chen, Lanfen Lin

PDF

Open Access

TL;DR

EPIC introduces a prompt-based interaction method that enhances multimodal text-image classification efficiency by reducing computational costs and parameters while maintaining or improving performance.

Contribution

The paper presents a novel prompt interaction strategy that significantly decreases resource consumption and parameters needed for multimodal classification tasks.

Findings

01

Reduces computational resources and trainable parameters by about 99%.

02

Achieves superior performance on UPMC-Food101 and SNLI-VE datasets.

03

Maintains comparable performance on MM-IMDB dataset.

Abstract

In recent years, large-scale pre-trained multimodal models (LMMs) generally emerge to integrate the vision and language modalities, achieving considerable success in multimodal tasks, such as text-image classification. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this context, we propose a novel efficient prompt-based multimodal interaction strategy, namely Efficient Prompt Interaction for text-image Classification (EPIC). Specifically, we utilize temporal prompts on intermediate layers, and integrate different modalities with similarity-based prompt interaction, to leverage sufficient information exchange between modalities. Utilizing this approach, our method achieves reduced computational resource consumption…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning