When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan

TL;DR
This paper investigates how visual distortions like rotation and noise impair relation reasoning in vision-language models, revealing a robustness gap and the limited effectiveness of current mitigation strategies.
Contribution
It provides a comprehensive analysis of the impact of visual perturbations on relational reasoning in VLMs and evaluates partial mitigation techniques.
Findings
Visual distortions significantly degrade relational reasoning.
Prompt-based strategies offer only partial improvements.
A gap exists between perceptual robustness and relational understanding.
Abstract
Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
