Evaluating Vision-Language Models on Bistable Images

Artemis Panagopoulou; Coby Melkin; Chris Callison-Burch

arXiv:2405.19423·cs.CV·May 31, 2024

Evaluating Vision-Language Models on Bistable Images

Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch

PDF

Open Access 1 Repo

TL;DR

This paper extensively evaluates vision-language models on bistable images, revealing biases, differences from human perception, and the influence of prompts and labels, with all resources openly available.

Contribution

It provides the most comprehensive analysis to date of vision-language models' responses to bistable images, including a new dataset and insights into model biases and language influence.

Findings

01

Most models prefer one interpretation over another

02

Models show minimal variance under image manipulations

03

Models differ from human perception and biases

Abstract

Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected them to 116 different manipulations in brightness, tint, and rotation. We evaluated twelve different models in both classification and generative tasks across six model architectures. Our findings reveal that, with the exception of models from the Idefics family and LLaVA1.5-13b, there is a pronounced preference for one interpretation over another among the models, and minimal variance under image manipulations, with few exceptions on image rotations. Additionally, we compared the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

artemisp/bistable-illusions-mllms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Religious Tourism and Spaces