DORi: Discovering Object Relationship for Moment Localization of a   Natural-Language Query in Video

Cristian Rodriguez-Opazo; Edison Marrese-Taylor; Basura Fernando; and Hongdong Li; Stephen Gould

arXiv:2010.06260·cs.CV·October 14, 2020·1 cites

DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, and Hongdong Li, Stephen Gould

PDF

Open Access 1 Repo

TL;DR

This paper introduces DORi, a novel approach for localizing specific moments in videos based on natural language queries by learning a language-conditioned video embedding that captures object, human, and activity relationships.

Contribution

The paper proposes a new message-passing algorithm that models spatial and temporal relationships in videos conditioned on language queries for improved moment localization.

Findings

01

Outperforms state-of-the-art on three benchmark datasets

02

Introduces YouCookII as a new benchmark dataset

03

Effective in capturing complex object-human-activity relationships

Abstract

This paper studies the task of temporal moment localization in a long untrimmed video using natural language query. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crodriguezo/dori
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization