Adaptive Proposal Generation Network for Temporal Sentence Localization   in Videos

Daizong Liu; Xiaoye Qu; Jianfeng Dong; Pan Zhou

arXiv:2109.06398·cs.CV·September 15, 2021

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

PDF

Open Access

TL;DR

This paper introduces an Adaptive Proposal Generation Network (APGN) that combines the efficiency of bottom-up approaches with segment-level interaction, significantly improving temporal sentence localization in videos.

Contribution

The paper proposes a novel APGN that adaptively generates proposals by foreground-background classification, reducing redundancy and enhancing semantic quality, thus outperforming existing methods.

Findings

01

APGN achieves state-of-the-art results on three benchmarks.

02

The method reduces redundant proposals compared to traditional top-down approaches.

03

Semantic quality of proposals is significantly improved.

Abstract

We address the problem of temporal sentence localization in videos (TSLV). Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals. Although they have achieved decent performance, the proposals are handcrafted and redundant. Recently, bottom-up framework attracts increasing attention due to its superior efficiency. It directly predicts the probabilities for each frame as a boundary. However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction. In this paper, we propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency. Specifically, we first perform a foreground-background classification upon the video and regress on the foreground frames to adaptively generate proposals. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization