Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding   in Videos

Yitian Yuan; Lin Ma; Jingwen Wang; Wei Liu; Wenwu Zhu

arXiv:1910.14303·cs.CV·November 1, 2019

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semantic conditioned dynamic modulation mechanism that enhances temporal sentence grounding in videos by better aligning sentence semantics with video content, leading to improved accuracy.

Contribution

The paper proposes a novel SCDM mechanism that dynamically modulates temporal convolutions based on sentence semantics, improving video-sentence correlation for grounding.

Findings

01

Outperforms state-of-the-art methods on three datasets

02

Demonstrates improved accuracy in localizing target video segments

03

Shows effectiveness of dynamic semantic modulation in temporal modeling

Abstract

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence. Existing methods mainly tackle this task via matching and aligning semantics between a sentence and candidate video segments, while neglect the fact that the sentence information plays an important role in temporally correlating and composing the described contents in videos. In this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence related video contents over time. More importantly, the proposed SCDM performs dynamically with respect to the diverse video contents so as to establish a more precise matching relationship between sentence and video, thereby improving the temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yytzsy/SCDM
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsConvolution