Saying What You're Looking For: Linguistics Meets Video Search

Andrei Barbu; N. Siddharth; Jeffrey Mark Siskind

arXiv:1309.5174·cs.CV·September 23, 2013·1 cites

Saying What You're Looking For: Linguistics Meets Video Search

Andrei Barbu, N. Siddharth, Jeffrey Mark Siskind

PDF

Open Access

TL;DR

This paper introduces a linguistically-informed video search method that accurately retrieves clips matching complex natural-language queries by combining compositional semantics with object detection and tracking.

Contribution

It presents a novel approach that integrates compositional semantics with object tracking to improve video search for complex natural-language queries.

Findings

01

Successfully searched 141 queries involving people and horses.

02

Achieved accurate retrieval of video clips depicting complex interactions.

03

Demonstrated effectiveness on 10 Hollywood movies.

Abstract

We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques