TL;DR
MEBench is a new benchmark designed to evaluate mutual exclusivity bias in vision-language models, incorporating spatial reasoning and a scalable data pipeline for realistic scene understanding.
Contribution
The paper introduces MEBench, a benchmark with a data generation pipeline and novel metrics to assess ME bias and spatial reasoning in vision-language models.
Findings
VLMs exhibit weak mutual exclusivity bias.
Models can leverage spatial context to resolve ambiguity.
MEBench provides a challenging environment for ME reasoning.
Abstract
This paper introduces MEBench, a novel benchmark for evaluating mutual exclusivity (ME) bias, a cognitive phenomenon observed in children during word learning. Unlike traditional ME tasks, MEBench further incorporates spatial reasoning to create more challenging and realistic evaluation settings. To facilitate controlled experimentation, we also present a flexible and scalable data generation pipeline that supports the construction of diverse annotated scenes. We assess the performance of various vision-language models (VLMs) on this benchmark using novel evaluation metrics that capture key aspects of ME-based reasoning. We find that these VLMs exhibit weak ME bias, while showing some ability to leverage extra spatial context to resolve ambiguity in multiple novel object settings. Project page: http://mebench.github.io/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
