IDEA: Increasing Text Diversity via Online Multi-Label Recognition for   Vision-Language Pre-training

Xinyu Huang; Youcai Zhang; Ying Cheng; Weiwei Tian; Ruiwei Zhao; Rui; Feng; Yuejie Zhang; Yaqian Li; Yandong Guo; Xiaobo Zhang

arXiv:2207.05333·cs.CV·August 2, 2022

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Xinyu Huang, Youcai Zhang, Ying Cheng, Weiwei Tian, Ruiwei Zhao, Rui, Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Xiaobo Zhang

PDF

1 Repo

TL;DR

IDEA enhances vision-language pre-training by online multi-label recognition, increasing text diversity and explicit supervision, leading to improved downstream performance with minimal additional computation.

Contribution

The paper introduces IDEA, a novel online multi-label recognition method that extracts and utilizes image tags from texts to improve VLP without relying on pre-defined object detectors.

Findings

01

Significant performance improvements on multiple downstream datasets.

02

Efficient online image tag identification with minimal extra computation.

03

Enhanced text diversity improves model understanding and generalization.

Abstract

Vision-Language Pre-training (VLP) with large-scale image-text pairs has demonstrated superior performance in various fields. However, the image-text pairs co-occurrent on the Internet typically lack explicit alignment information, which is suboptimal for VLP. Existing methods proposed to adopt an off-the-shelf object detector to utilize additional image tag information. However, the object detector is time-consuming and can only identify the pre-defined object categories, limiting the model capacity. Inspired by the observation that the texts incorporate incomplete fine-grained image information, we introduce IDEA, which stands for increasing text diversity via online multi-label recognition for VLP. IDEA shows that multi-label learning with image tags extracted from the texts can be jointly optimized during VLP. Moreover, IDEA can identify valuable image tags online to provide more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinyu1205/idea-pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.