Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual   Contexts

Sandro Pezzelle; Raquel Fern\'andez

arXiv:1908.10285·cs.CL·August 28, 2019

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

Sandro Pezzelle, Raquel Fern\'andez

PDF

TL;DR

This paper investigates how multi-modal models learn the meaning of size adjectives like 'big' and 'small' from visual contexts, highlighting their ability to assess size relationally and the limitations in forming abstract, compositional representations.

Contribution

It introduces a relational, context-dependent approach to modeling size adjectives in visual scenes and evaluates state-of-the-art models' capabilities and limitations in this task.

Findings

01

Models can learn size adjective functions in simple contexts.

02

Performance declines with increased task complexity.

03

Models fail to develop abstract, compositional representations.

Abstract

This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts. Inspired by cognitive and linguistic evidence showing that the use of these expressions relies on setting a threshold that is dependent on a specific context, we investigate the ability of multi-modal models in assessing whether an object is `big' or `small' in a given visual scene. In contrast with the standard computational approach that simplistically treats gradable adjectives as `fixed' attributes, we pose the problem as relational: to be successful, a model has to consider the full visual context. By means of four main tasks, we show that state-of-the-art models (but not a relatively strong baseline) can learn the function subtending the meaning of size adjectives, though their performance is found to decrease while moving from simple to more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.