Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting
Jiarui Wu, Zhuo Liu, Hangfeng He

TL;DR
This paper introduces a constraint-aware prompting framework that reduces spatial relation hallucinations in large vision-language models by enforcing bidirectional and transitivity constraints, leading to more coherent spatial predictions.
Contribution
The paper proposes a novel constraint-aware prompting method with bidirectional and transitivity constraints to mitigate spatial hallucinations in LVLMs.
Findings
Improved spatial relation accuracy on three datasets
Enhanced consistency in object relation predictions
Systematic analysis of constraint effectiveness
Abstract
Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs), leading to generate incorrect predictions about object positions and spatial configurations within an image. To address this issue, we propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations. Specifically, we introduce two types of constraints: (1) bidirectional constraint, which ensures consistency in pairwise object relations, and (2) transitivity constraint, which enforces relational dependence across multiple objects. By incorporating these constraints, LVLMs can produce more spatially coherent and consistent outputs. We evaluate our method on three widely-used spatial relation datasets, demonstrating performance improvements over existing approaches. Additionally, a systematic analysis of various bidirectional relation analysis choices and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Psychiatry, Mental Health, Neuroscience
