A Survey of Current Datasets for Vision and Language Research

Francis Ferraro; Nasrin Mostafazadeh; Ting-Hao (Kenneth) Huang; Lucy; Vanderwende; Jacob Devlin; Michel Galley; Margaret Mitchell

arXiv:1506.06833·cs.CL·August 23, 2021

A Survey of Current Datasets for Vision and Language Research

Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao (Kenneth) Huang, Lucy, Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell

PDF

TL;DR

This survey reviews recent datasets for vision and language AI, proposing quality metrics and categorization, highlighting their evolving complexity and diverse strengths and weaknesses.

Contribution

It introduces a set of quality metrics for evaluating vision-language datasets and categorizes them, providing a comprehensive analysis of their characteristics and progress.

Findings

01

Recent datasets use more complex language and abstract concepts

02

Different datasets exhibit unique strengths and weaknesses

03

The proposed metrics help evaluate dataset quality effectively

Abstract

Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.