A Survey on Natural Language Video Localization
Xinfang Liu, Xiushan Nie (Member, IEEE), Zhifang Tan, Jie Guo, Yilong, Yin

TL;DR
This paper provides a comprehensive survey of natural language video localization, covering algorithms, datasets, evaluation metrics, and future directions in this emerging field.
Contribution
It offers a systematic categorization and analysis of supervised and weakly-supervised NLVL methods, along with dataset and evaluation protocol summaries.
Findings
Analysis of strengths and weaknesses of existing methods
Overview of datasets and evaluation protocols
Identification of future research directions
Abstract
Natural language video localization (NLVL), which aims to locate a target moment from a video that semantically corresponds to a text query, is a novel and challenging task. Toward this end, in this paper, we present a comprehensive survey of the NLVL algorithms, where we first propose the pipeline of NLVL, and then categorize them into supervised and weakly-supervised methods, following by the analysis of the strengths and weaknesses of each kind of methods. Subsequently, we present the dataset, evaluation protocols and the general performance analysis. Finally, the possible perspectives are obtained by summarizing the existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
