NeIn: Telling What You Don't Want
Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet, Tran, Truong Nguyen, Susan Gauch

TL;DR
This paper introduces NeIn, a large-scale dataset for studying negation in instruction-based image editing, revealing that current vision-language models struggle with negative queries.
Contribution
The paper presents the first extensive dataset for negation in image editing and an evaluation protocol, highlighting challenges in current models' understanding of negation.
Findings
State-of-the-art VLMs perform poorly on negation queries.
NeIn dataset contains over 366,000 samples for training and benchmarking.
The evaluation protocol enables systematic assessment of negation understanding.
Abstract
Negation is a fundamental linguistic concept used by humans to convey information that they do not desire. Despite this, minimal research has focused on negation within text-guided image editing. This lack of research means that vision-language models (VLMs) for image editing may struggle to understand negation, implying that they struggle to provide accurate results. One barrier to achieving human-level intelligence is the lack of a standard collection by which research into negation can be evaluated. This paper presents the first large-scale dataset, Negative Instruction (NeIn), for studying negation within instruction-based image editing. Our dataset comprises 366,957 quintuplets, i.e., source image, original caption, selected object, negative sentence, and target image in total, including 342,775 queries for training and 24,182 queries for benchmarking image editing methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques
MethodsBLIP: Bootstrapping Language-Image Pre-training
