Too Large; Data Reduction for Vision-Language Pre-Training

Alex Jinpeng Wang; Kevin Qinghong Lin; David Junhao Zhang; Stan; Weixian Lei; Mike Zheng Shou

arXiv:2305.20087·cs.CV·August 21, 2023·1 cites

Too Large; Data Reduction for Vision-Language Pre-Training

Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan, Weixian Lei, Mike Zheng Shou

PDF

Open Access 2 Repos

TL;DR

This paper introduces TL;DR, a data reduction method for vision-language pre-training that compresses large datasets into smaller, high-quality sets, maintaining or improving model performance while significantly speeding up training.

Contribution

The paper proposes a novel data compression algorithm for VLP datasets that reduces dataset size while preserving or enhancing downstream task performance.

Findings

01

TL;DR compresses datasets by up to 85%

02

Models trained on compressed data achieve comparable or better results

03

Significantly accelerates the pretraining process

Abstract

This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques