When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

Philip Wootaek Shin; Ajay Narayanan Sridhar; Sivani Devarapalli; Rui Zhang; Jack Sampson; Vijaykrishnan Narayanan

arXiv:2605.05045·cs.CV·May 12, 2026

When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan

PDF

TL;DR

This paper investigates how visual distortions like rotation and noise impair relation reasoning in vision-language models, revealing a robustness gap and the limited effectiveness of current mitigation strategies.

Contribution

It provides a comprehensive analysis of the impact of visual perturbations on relational reasoning in VLMs and evaluates partial mitigation techniques.

Findings

01

Visual distortions significantly degrade relational reasoning.

02

Prompt-based strategies offer only partial improvements.

03

A gap exists between perceptual robustness and relational understanding.

Abstract

Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.