Proposal-free Temporal Moment Localization of a Natural-Language Query   in Video using Guided Attention

Cristian Rodriguez-Opazo; Edison Marrese-Taylor; Fatemeh Sadat Saleh,; Hongdong Li; Stephen Gould

arXiv:1908.07236·cs.CV·March 13, 2020

Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh,, Hongdong Li, Stephen Gould

PDF

1 Repo

TL;DR

This paper introduces a proposal-free, end-to-end trainable method for localizing specific moments in videos based on natural language queries, improving efficiency and accuracy over previous propose-and-rank approaches.

Contribution

The paper proposes a novel proposal-free approach with dynamic filtering, a new loss function, and soft labels for better temporal localization in videos using natural language.

Findings

01

Outperforms state-of-the-art on Charades-STA and ActivityNet-Captions datasets

02

Efficient end-to-end trainable model

03

Effective handling of annotation uncertainty

Abstract

This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and {\em proposal-free approach} that relies on three key components: a dynamic filter to transfer language information to the visual domain, a new loss function to guide our model to attend the most relevant parts of the video, and soft labels to model annotation uncertainty. We evaluate our method on two benchmark datasets, Charades-STA and ActivityNet-Captions. Experimental results show that our approach outperforms state-of-the-art methods on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crodriguezo/TMLGA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.