# Trends in Integration of Vision and Language Research: A Survey of   Tasks, Datasets, and Methods

**Authors:** Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow

arXiv: 1907.09358 · 2022-01-04

## TL;DR

This survey reviews ten key vision-and-language tasks, analyzing their problem formulations, datasets, methods, and results, aiming to guide future research and innovation in the integration of these AI sub-fields.

## Contribution

It provides a comprehensive comparison of tasks, datasets, and methods in vision-language integration, extending beyond previous surveys by covering multiple content types and offering future directions.

## Key findings

- Comparison of state-of-the-art methods across tasks
- Analysis of datasets and evaluation measures
- Identification of challenges and future research directions

## Abstract

Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.09358/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1907.09358/full.md

## References

443 references — full list in the complete paper: https://tomesphere.com/paper/1907.09358/full.md

---
Source: https://tomesphere.com/paper/1907.09358