The Scenario Refiner: Grounding subjects in images at the morphological   level

Claudia Tagliaferri; Sofia Axioti; Albert Gatt; Denis Paperno

arXiv:2309.11252·cs.CL·September 21, 2023

The Scenario Refiner: Grounding subjects in images at the morphological level

Claudia Tagliaferri, Sofia Axioti, Albert Gatt, Denis Paperno

PDF

Open Access

TL;DR

This paper investigates whether vision-language models understand morphological distinctions like 'runner' versus 'running' by comparing their predictions to human judgments, revealing biases and differences in semantic grounding.

Contribution

Introduces a new methodology and dataset to evaluate V extbar L models' ability to capture morphological distinctions, highlighting model biases and architecture influences.

Findings

01

Models differ from humans in morphological understanding.

02

Models exhibit grammatical biases in visual scenario prediction.

03

Methodology can be extended to other nuanced language features.

Abstract

Derivationally related words, such as "runner" and "running", exhibit semantic differences which also elicit different visual scenarios. In this paper, we ask whether Vision and Language (V\&L) models capture such distinctions at the morphological level, using a a new methodology and dataset. We compare the results from V\&L models to human judgements and find that models' predictions differ from those of human participants, in particular displaying a grammatical bias. We further investigate whether the human-model misalignment is related to model architecture. Our methodology, developed on one specific morphological contrast, can be further extended for testing models on capturing other nuanced language features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Categorization, perception, and language