A Survey on Natural Language Video Localization

Xinfang Liu; Xiushan Nie (Member; IEEE); Zhifang Tan; Jie Guo; Yilong; Yin

arXiv:2104.00234·cs.CV·April 2, 2021·5 cites

A Survey on Natural Language Video Localization

Xinfang Liu, Xiushan Nie (Member, IEEE), Zhifang Tan, Jie Guo, Yilong, Yin

PDF

Open Access

TL;DR

This paper provides a comprehensive survey of natural language video localization, covering algorithms, datasets, evaluation metrics, and future directions in this emerging field.

Contribution

It offers a systematic categorization and analysis of supervised and weakly-supervised NLVL methods, along with dataset and evaluation protocol summaries.

Findings

01

Analysis of strengths and weaknesses of existing methods

02

Overview of datasets and evaluation protocols

03

Identification of future research directions

Abstract

Natural language video localization (NLVL), which aims to locate a target moment from a video that semantically corresponds to a text query, is a novel and challenging task. Toward this end, in this paper, we present a comprehensive survey of the NLVL algorithms, where we first propose the pipeline of NLVL, and then categorize them into supervised and weakly-supervised methods, following by the analysis of the strengths and weaknesses of each kind of methods. Subsequently, we present the dataset, evaluation protocols and the general performance analysis. Finally, the possible perspectives are obtained by summarizing the existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition