A Comprehensive Review of the Video-to-Text Problem

Jesus Perez-Martin; Benjamin Bustos; Silvio Jamil F. Guimar\~aes; and Ivan Sipiran; Jorge P\'erez; Grethel Coello Said

arXiv:2103.14785·cs.CV·December 2, 2021

A Comprehensive Review of the Video-to-Text Problem

Jesus Perez-Martin, Benjamin Bustos, Silvio Jamil F. Guimar\~aes, and Ivan Sipiran, Jorge P\'erez, Grethel Coello Said

PDF

1 Repo

TL;DR

This paper provides a comprehensive review of the video-to-text problem, covering state-of-the-art methods, datasets, evaluation metrics, challenges, and future directions in connecting visual videos with textual descriptions.

Contribution

It categorizes and analyzes existing techniques, evaluates 26 datasets, and discusses progress, challenges, and future research avenues in video-to-text tasks.

Findings

01

Progress in video captioning and retrieval techniques

02

Identification of dataset limitations and strengths

03

Discussion of key challenges and future directions

Abstract

Research in the Vision and Language area encompasses challenging topics that seek to connect visual and textual information. When the visual information is related to videos, this takes us into Video-Text Research, which includes several challenging tasks such as video question answering, video summarization with natural language, and video-to-text and text-to-video conversion. This paper reviews the video-to-text problem, in which the goal is to associate an input video with its textual description. This association can be mainly made by retrieving the most relevant descriptions from a corpus or generating a new one given a context video. These two ways represent essential tasks for Computer Vision and Natural Language Processing communities, called text retrieval from video task and video captioning/description task. These two tasks are substantially more complex than predicting or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jssprz/video_captioning_datasets
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.