MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models

Anh Thai; Stefan Stojanov; Zixuan Huang; Bikram Boote; James M. Rehg

arXiv:2505.20122·cs.CV·April 17, 2026

MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models

Anh Thai, Stefan Stojanov, Zixuan Huang, Bikram Boote, James M. Rehg

PDF

1 Repo

TL;DR

MEBench is a new benchmark designed to evaluate mutual exclusivity bias in vision-language models, incorporating spatial reasoning and a scalable data pipeline for realistic scene understanding.

Contribution

The paper introduces MEBench, a benchmark with a data generation pipeline and novel metrics to assess ME bias and spatial reasoning in vision-language models.

Findings

01

VLMs exhibit weak mutual exclusivity bias.

02

Models can leverage spatial context to resolve ambiguity.

03

MEBench provides a challenging environment for ME reasoning.

Abstract

This paper introduces MEBench, a novel benchmark for evaluating mutual exclusivity (ME) bias, a cognitive phenomenon observed in children during word learning. Unlike traditional ME tasks, MEBench further incorporates spatial reasoning to create more challenging and realistic evaluation settings. To facilitate controlled experimentation, we also present a flexible and scalable data generation pipeline that supports the construction of diverse annotated scenes. We assess the performance of various vision-language models (VLMs) on this benchmark using novel evaluation metrics that capture key aspects of ME-based reasoning. We find that these VLMs exhibit weak ME bias, while showing some ability to leverage extra spatial context to resolve ambiguity in multiple novel object settings. Project page: http://mebench.github.io/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

http://mebench.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.