Negation-Aware Test-Time Adaptation for Vision-Language Models
Haochen Han, Alex Jinpeng Wang, Fangming Liu, Jun Zhu

TL;DR
This paper introduces NEAT, a low-resource, test-time adaptation method that improves negation understanding in vision-language models by addressing distribution shifts without extensive retraining.
Contribution
The paper proposes a novel negation-aware test-time adaptation method that efficiently adjusts VLMs for negation understanding with minimal additional parameters.
Findings
NEAT achieves comparable or better performance than state-of-the-art methods.
It requires less than 0.01% of trainable parameters.
Extensive experiments validate NEAT's effectiveness across negation tasks.
Abstract
In this paper, we study a practical but less-touched problem in Vision-Language Models (VLMs), \ie, negation understanding. Specifically, many real-world applications require models to explicitly identify what is false or non-existent, \eg, radiologists may search for images that exclude specific conditions. Despite the impressive transferability of VLMs through large-scale training, they suffer from a critical limitation that fails to handle negation. To address this challenge, existing methods attribute its root cause to the scarcity of negation training data and propose to fine-tune VLMs on massive data containing explicit negation. Undoubtedly, such data-centric solutions demand substantial data and computational resources, limiting their sustainable widespread adoption. To tackle negation in a low-carbon manner, we empirically observe that the key obstacle lies in the dual-concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
