Contrastive Visual Data Augmentation

Yu Zhou; Bingxuan Li; Mohan Tang; Xiaomeng Jin; Te-Lin Wu; Kuan-Hao Huang; Heng Ji; Kai-Wei Chang; Nanyun Peng

arXiv:2502.17709·cs.CV·June 6, 2025

Contrastive Visual Data Augmentation

Yu Zhou, Bingxuan Li, Mohan Tang, Xiaomeng Jin, Te-Lin Wu, Kuan-Hao Huang, Heng Ji, Kai-Wei Chang, Nanyun Peng

PDF

Open Access 1 Video

TL;DR

This paper introduces CoDA, a contrastive visual data augmentation method that enhances large multimodal models' ability to recognize novel and rare concepts by generating targeted synthetic data, significantly improving accuracy.

Contribution

The paper presents a novel contrastive data augmentation strategy that leverages multimodal generative models to improve recognition of unseen concepts in LMMs.

Findings

01

CoDA improves accuracy by up to 12.3% on NovelSpecies.

02

It outperforms existing data augmentation methods on multiple datasets.

03

Human verification confirms quality of augmented data.

Abstract

Large multimodal models (LMMs) often struggle to recognize novel concepts, as they rely on pre-trained knowledge and have limited ability to capture subtle visual details. Domain-specific knowledge gaps in training also make them prone to confusing visually similar, commonly misrepresented, or low-resource concepts. To help LMMs better align nuanced visual features with language, improving their ability to recognize and reason about novel or rare concepts, we propose a Contrastive visual Data Augmentation (CoDA) strategy. CoDA extracts key contrastive textual and visual features of target concepts against the known concepts they are misrecognized as, and then uses multimodal generative models to produce targeted synthetic data. Automatic filtering of extracted features and augmented images is implemented to guarantee their quality, as verified by human annotators. We show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contrastive Visual Data Augmentation· slideslive

Taxonomy

TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques

MethodsALIGN