Natural Language Video Localization with Learnable Moment Proposals

Shaoning Xiao; Long Chen; Jian Shao; Yueting Zhuang; Jun Xiao

arXiv:2109.10678·cs.CV·November 2, 2022·1 cites

Natural Language Video Localization with Learnable Moment Proposals

Shaoning Xiao, Long Chen, Jian Shao, Yueting Zhuang, Jun Xiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces LPNet, a learnable proposal network for natural language video localization, which dynamically adjusts moment proposals during training, outperforming traditional propose-and-rank and proposal-free methods.

Contribution

The paper proposes a novel learnable proposal approach with dynamic adjustment and boundary-aware loss, improving localization accuracy over existing methods.

Findings

01

LPNet outperforms state-of-the-art methods on benchmark datasets.

02

Learnable proposals improve coverage and reduce redundancy.

03

Boundary-aware loss enhances frame-level localization accuracy.

Abstract

Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query. To address this task, existing methods can be roughly grouped into two groups: 1) propose-and-rank models first define a set of hand-designed moment candidates and then find out the best-matching one. 2) proposal-free models directly predict two temporal boundaries of the referential moment from frames. Currently, almost all the propose-and-rank methods have inferior performance than proposal-free counterparts. In this paper, we argue that propose-and-rank approach is underestimated due to the predefined manners: 1) Hand-designed rules are hard to guarantee the complete coverage of targeted segments. 2) Densely sampled candidate moments cause redundant computation and degrade the performance of ranking process. To this end, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaoneil/lpnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning