TAI++: Text as Image for Multi-Label Image Classification by Co-Learning   Transferable Prompt

Xiangyu Wu; Qing-Yuan Jiang; Yang Yang; Yi-Feng Wu; Qing-Guo Chen,; Jianfeng Lu

arXiv:2405.06926·cs.CV·May 14, 2024

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen,, Jianfeng Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAI++, a novel approach that transforms text into images to improve multi-label image classification by co-learning visual and textual prompts using pre-trained vision-language models.

Contribution

The paper proposes a pseudo-visual prompt module and a co-learning strategy with a dual-adapter to enhance visual knowledge transfer in multi-label classification.

Findings

01

Outperforms state-of-the-art methods on VOC2007, MS-COCO, and NUSWIDE datasets.

02

Effectively mines diverse visual knowledge via pseudo-visual prompts.

03

Enhances visual representation abilities through co-learning with text prompts.

Abstract

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt~(PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jinx630/Pseudo-Visual-Prompt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Handwritten Text Recognition Techniques