Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
Mohammadmostafa Rostamkhani, Baktash Ansari, Hoorieh Sabzevari, Farzan, Rahmani, Sauleh Eetemadi

TL;DR
This paper introduces the Illusory VQA task and datasets to evaluate multimodal models on visual illusions, demonstrating that simple preprocessing techniques can significantly improve model performance and human-like perception.
Contribution
It presents the first specialized datasets for Illusory VQA, evaluates state-of-the-art models, and proposes a low-pass filter method to enhance illusion recognition without extensive fine-tuning.
Findings
Gaussian and blur filters improve model performance on illusions
BLIP-2 outperforms humans on IllusionAnimals without fine-tuning
Fine-tuning and preprocessing increase model robustness to illusions
Abstract
In recent years, Visual Question Answering (VQA) has made significant strides, particularly with the advent of multimodal models that integrate vision and language understanding. However, existing VQA datasets often overlook the complexities introduced by image illusions, which pose unique challenges for both human perception and model interpretation. In this study, we introduce a novel task called Illusory VQA, along with four specialized datasets: IllusionMNIST, IllusionFashionMNIST, IllusionAnimals, and IllusionChar. These datasets are designed to evaluate the performance of state-of-the-art multimodal models in recognizing and interpreting visual illusions. We assess the zero-shot performance of various models, fine-tune selected models on our datasets, and propose a simple yet effective solution for illusion detection using Gaussian and blur low-pass filters. We show that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging
