ShapeWorld - A new test methodology for multimodal language   understanding

Alexander Kuhnle; Ann Copestake

arXiv:1704.04517·cs.CL·April 18, 2017·47 cites

ShapeWorld - A new test methodology for multimodal language understanding

Alexander Kuhnle, Ann Copestake

PDF

Open Access 3 Repos

TL;DR

ShapeWorld presents a new framework for evaluating multimodal language understanding by automatically generating controlled artificial data, enabling detailed assessment of models' generalization and reasoning abilities.

Contribution

It introduces a novel, controllable data generation methodology for testing multimodal models, emphasizing their generalization capabilities beyond existing benchmarks.

Findings

01

Models show varying generalization abilities across tasks

02

Framework provides detailed insights into model strengths and weaknesses

03

Open-sourcing encourages further research in multimodal understanding

Abstract

We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities. In this approach, artificial data is automatically generated according to the experimenter's specifications. The content of the data, both during training and evaluation, can be controlled in detail, which enables tasks to be created that require true generalization abilities, in particular the combination of previously introduced concepts in novel ways. We demonstrate the potential of our methodology by evaluating various visual question answering models on four different tasks, and show how our framework gives us detailed insights into their capabilities and limitations. By open-sourcing our framework, we hope to stimulate progress in the field of multimodal language understanding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems