A Faster Method for Tracking and Scoring Videos Corresponding to Sentences
Haonan Yu, Daniel P. Barrett, Jeffrey Mark Siskind

TL;DR
This paper introduces an optimized algorithm for the sentence tracker that significantly reduces computational complexity, enabling more scalable and efficient video-sentence matching tasks without sacrificing accuracy.
Contribution
An improved method for sentence tracker that reduces space complexity from exponential to polynomial and maintains result quality, facilitating scalable video-sentence applications.
Findings
Reduced space complexity from exponential to polynomial
Maintained qualitative result quality
Enabled scalable video retrieval and description tasks
Abstract
Prior work presented the sentence tracker, a method for scoring how well a sentence describes a video clip or alternatively how well a video clip depicts a sentence. We present an improved method for optimizing the same cost function employed by this prior work, reducing the space complexity from exponential in the sentence length to polynomial, as well as producing a qualitatively identical result in time polynomial in the sentence length instead of exponential. Since this new method is plug-compatible with the prior method, it can be used for the same applications: video retrieval with sentential queries, generating sentential descriptions of video clips, and focusing the attention of a tracker with a sentence, while allowing these applications to scale with significantly larger numbers of object detections, word meanings modeled with HMMs with significantly larger numbers of states,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
