WinoGAViL: Gamified Association Benchmark to Challenge   Vision-and-Language Models

Yonatan Bitton; Nitzan Bitton Guetta; Ron Yosef; Yuval Elovici; Mohit; Bansal; Gabriel Stanovsky; Roy Schwartz

arXiv:2207.12576·cs.CL·October 12, 2022·6 cites

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit, Bansal, Gabriel Stanovsky, Roy Schwartz

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

WinoGAViL introduces a gamified benchmark for vision-and-language models, challenging their commonsense reasoning through an online game that collects human-like associations, revealing current models' limitations.

Contribution

The paper presents WinoGAViL, a novel gamified benchmark for evaluating vision-and-language models' association and reasoning skills, along with a dataset and interactive platform.

Findings

01

Humans find the associations intuitive (>90% Jaccard index).

02

State-of-the-art models like ViLT score only 52%.

03

Associations require diverse reasoning skills.

Abstract

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a textual cue related to several visual candidates, and another player tries to identify them. Human players are rewarded for creating associations that are challenging for a rival AI model but still solvable by other human players. We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

winogavil/winogavil-experiments
pytorchOfficial

Datasets

Videos

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning