Temporally Grounding Language Queries in Videos by Contextual   Boundary-aware Prediction

Jingwen Wang; Lin Ma; Wenhao Jiang

arXiv:1909.05010·cs.CV·December 19, 2019·20 cites

Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

Jingwen Wang, Lin Ma, Wenhao Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end boundary-aware model called CBP for more precise temporal grounding of language queries in videos, leveraging contextual boundary prediction to improve localization accuracy.

Contribution

It proposes a novel boundary-aware approach that explicitly models semantic boundaries and contextual information for better video segment localization.

Findings

01

CBP outperforms existing methods on three public datasets.

02

The model achieves higher localization precision.

03

Contextual boundary modeling improves segmentation accuracy.

Abstract

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JaywongWang/CBP
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization