A Faster Method for Tracking and Scoring Videos Corresponding to   Sentences

Haonan Yu; Daniel P. Barrett; Jeffrey Mark Siskind

arXiv:1411.4064·cs.CV·November 18, 2014

A Faster Method for Tracking and Scoring Videos Corresponding to Sentences

Haonan Yu, Daniel P. Barrett, Jeffrey Mark Siskind

PDF

Open Access

TL;DR

This paper introduces an optimized algorithm for the sentence tracker that significantly reduces computational complexity, enabling more scalable and efficient video-sentence matching tasks without sacrificing accuracy.

Contribution

An improved method for sentence tracker that reduces space complexity from exponential to polynomial and maintains result quality, facilitating scalable video-sentence applications.

Findings

01

Reduced space complexity from exponential to polynomial

02

Maintained qualitative result quality

03

Enabled scalable video retrieval and description tasks

Abstract

Prior work presented the sentence tracker, a method for scoring how well a sentence describes a video clip or alternatively how well a video clip depicts a sentence. We present an improved method for optimizing the same cost function employed by this prior work, reducing the space complexity from exponential in the sentence length to polynomial, as well as producing a qualitatively identical result in time polynomial in the sentence length instead of exponential. Since this new method is plug-compatible with the prior method, it can be used for the same applications: video retrieval with sentential queries, generating sentential descriptions of video clips, and focusing the attention of a tracker with a sentence, while allowing these applications to scale with significantly larger numbers of object detections, word meanings modeled with HMMs with significantly larger numbers of states,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition