NLIP: Noise-robust Language-Image Pre-training

Runhui Huang; Yanxin Long; Jianhua Han; Hang Xu; Xiwen Liang; Chunjing; Xu; Xiaodan Liang

arXiv:2212.07086·cs.CV·January 5, 2023

NLIP: Noise-robust Language-Image Pre-training

Runhui Huang, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing, Xu, Xiaodan Liang

PDF

Open Access 1 Video

TL;DR

NLIP introduces a noise-robust pre-training framework for image-text models that effectively handles noisy web data through noise-harmonization and noise-completion, leading to improved performance on various downstream tasks.

Contribution

The paper proposes a novel noise mitigation approach in cross-modal pre-training that jointly addresses incorrect and incomplete data issues without manual data cleaning.

Findings

01

Significant performance gains on zero-shot classification, captioning, and retrieval tasks.

02

Effective noise handling with only 26M data, outperforming existing models.

03

Enhanced robustness and stability in large-scale image-text pre-training.

Abstract

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e.g., zero-shot classification, retrieval and image captioning. However, their successes highly rely on the scale and quality of web-crawled data that naturally contain incomplete and noisy information (e.g., wrong or irrelevant content). Existing works either design manual rules to clean data or generate pseudo-targets as auxiliary signals for reducing noise impact, which do not explicitly tackle both the incorrect and incomplete challenges simultaneously. In this paper, to automatically mitigate the impact of noise by solely mining over existing data, we propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. First, in noise-harmonization scheme, NLIP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

NLIP: Noise-robust Language-Image Pre-training· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training