VALSE: A Task-Independent Benchmark for Vision and Language Models   Centered on Linguistic Phenomena

Letitia Parcalabescu; Michele Cafagna; Lilitta Muradjan; Anette Frank,; Iacer Calixto; Albert Gatt

arXiv:2112.07566·cs.CL·February 13, 2024·6 cites

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Letitia Parcalabescu, Michele Cafagna, Lilitta Muradjan, Anette Frank,, Iacer Calixto, Albert Gatt

PDF

Open Access 1 Repo

TL;DR

VALSE is a new benchmark for evaluating vision and language models on their ability to understand and ground specific linguistic phenomena in visual data, enabling more detailed assessments of their linguistic and visual reasoning capabilities.

Contribution

The paper introduces VALSE, a comprehensive benchmark with six tests targeting linguistic phenomena, supporting valid foil construction, and providing a new tool for fine-grained evaluation of V&L models.

Findings

01

Current models struggle with most linguistic phenomena.

02

VALSE reveals gaps in models' visio-linguistic grounding abilities.

03

Benchmark facilitates future improvements in V&L models.

Abstract

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

heidelberg-nlp/valse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques