TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Yuliang Cai; Jesse Thomason; Mohammad Rostami

arXiv:2505.18434·cs.CV·May 27, 2025

TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Yuliang Cai, Jesse Thomason, Mohammad Rostami

PDF

Open Access

TL;DR

This paper introduces TNG-CLIP, a training-time negation data generation method that enhances CLIP's negation understanding with minimal additional training time, and presents a new benchmark for evaluating negation comprehension.

Contribution

The paper proposes a novel training-time negation data generation pipeline and a new benchmark, Neg-TtoI, to improve and evaluate negation understanding in vision-language models.

Findings

01

TNG-CLIP achieves state-of-the-art results on negation benchmarks.

02

Negation data generation adds only 2.5% extra training time.

03

The Neg-TtoI benchmark effectively assesses negation understanding in models.

Abstract

Vision-language models (VLMs), such as CLIP, have demonstrated strong performance across a range of downstream tasks. However, CLIP is still limited in negation understanding: the ability to recognize the absence or exclusion of a concept. Existing methods address the problem by using a large language model (LLM) to generate large-scale data of image captions containing negation for further fine-tuning CLIP. However, these methods are both time- and compute-intensive, and their evaluations are typically restricted to image-text matching tasks. To expand the horizon, we (1) introduce a training-time negation data generation pipeline such that negation captions are generated during the training stage, which only increases 2.5% extra training time, and (2) we propose the first benchmark, Neg-TtoI, for evaluating text-to-image generation models on prompts containing negation, assessing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsContrastive Language-Image Pre-training