Searching a Raw Video Database using Natural Language Queries
Sriram Krishna, Siddarth Vinay, Srinivas K S

TL;DR
This paper presents an end-to-end system that enables searching large raw video databases using natural language voice queries by generating captions with neural networks.
Contribution
It introduces a novel pipeline combining RNNs and CNNs to generate video captions for efficient natural language-based video retrieval.
Findings
Effective caption generation for videos
Supports voice-based natural language queries
Improves video search accuracy
Abstract
The number of videos being produced and consequently stored in databases for video streaming platforms has been increasing exponentially over time. This vast database should be easily index-able to find the requisite clip or video to match the given search specification, preferably in the form of a textual query. This work aims to provide an end-to-end pipeline to search a video database with a voice query from the end user. The pipeline makes use of Recurrent Neural Networks in combination with Convolutional Neural Networks to generate captions of the video clips present in the database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimodal Machine Learning Applications
