NeIn: Telling What You Don't Want

Nhat-Tan Bui; Dinh-Hieu Hoang; Quoc-Huy Trinh; Minh-Triet; Tran; Truong Nguyen; Susan Gauch

arXiv:2409.06481·cs.CV·April 8, 2025

NeIn: Telling What You Don't Want

Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet, Tran, Truong Nguyen, Susan Gauch

PDF

Open Access 1 Datasets

TL;DR

This paper introduces NeIn, a large-scale dataset for studying negation in instruction-based image editing, revealing that current vision-language models struggle with negative queries.

Contribution

The paper presents the first extensive dataset for negation in image editing and an evaluation protocol, highlighting challenges in current models' understanding of negation.

Findings

01

State-of-the-art VLMs perform poorly on negation queries.

02

NeIn dataset contains over 366,000 samples for training and benchmarking.

03

The evaluation protocol enables systematic assessment of negation understanding.

Abstract

Negation is a fundamental linguistic concept used by humans to convey information that they do not desire. Despite this, minimal research has focused on negation within text-guided image editing. This lack of research means that vision-language models (VLMs) for image editing may struggle to understand negation, implying that they struggle to provide accurate results. One barrier to achieving human-level intelligence is the lack of a standard collection by which research into negation can be evaluated. This paper presents the first large-scale dataset, Negative Instruction (NeIn), for studying negation within instruction-based image editing. Our dataset comprises 366,957 quintuplets, i.e., source image, original caption, selected object, negative sentence, and target image in total, including 342,775 queries for training and 24,182 queries for benchmarking image editing methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nhatttanbui/NeIn
dataset· 46 dl
46 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques

MethodsBLIP: Bootstrapping Language-Image Pre-training