CIVET: Systematic Evaluation of Understanding in VLMs

Massimo Rizzoli; Simone Alghisi; Olha Khomyn; Gabriel Roccabruna; Seyed Mahed Mousavi; Giuseppe Riccardi

arXiv:2506.05146·cs.CV·June 23, 2025

CIVET: Systematic Evaluation of Understanding in VLMs

Massimo Rizzoli, Simone Alghisi, Olha Khomyn, Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe Riccardi

PDF

Open Access 1 Video

TL;DR

This paper introduces CIVET, a systematic framework for evaluating vision-language models' understanding of object properties and relations, revealing their limitations compared to human performance.

Contribution

CIVET provides a standardized, extensible method for assessing VLMs' scene understanding, addressing previous evaluation gaps with controlled stimuli and statistical rigor.

Findings

01

VLMs recognize limited object properties

02

Performance depends on object position

03

Struggle to understand object relations

Abstract

While Vision-Language Models (VLMs) have achieved competitive performance in various tasks, their comprehension of the underlying structure and semantics of a scene remains understudied. To investigate the understanding of VLMs, we study their capability regarding object properties and relations in a controlled and interpretable manner. To this scope, we introduce CIVET, a novel and extensible framework for systematiC evaluatIon Via controllEd sTimuli. CIVET addresses the lack of standardized systematic evaluation for assessing VLMs' understanding, enabling researchers to test hypotheses with statistical rigor. With CIVET, we evaluate five state-of-the-art VLMs on exhaustive sets of stimuli, free from annotation noise, dataset-specific biases, and uncontrolled scene complexity. Our findings reveal that 1) current VLMs can accurately recognize only a limited set of basic object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CIVET: Systematic Evaluation of Understanding in VLMs· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsSparse Evolutionary Training