Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

TL;DR
This paper introduces a scene-driven video retrieval method that uses deep semantic and aesthetic features to identify and visualize the most significant parts of edited videos in response to textual queries.
Contribution
It proposes a novel retrieval pipeline that segments videos into scenes, retrieves relevant scenes with deep learning, and visualizes them with meaningful thumbnails.
Findings
Effective retrieval of significant scenes demonstrated
Thumbnails are both semantically meaningful and aesthetically remarkable
Quantitative results show improved retrieval accuracy
Abstract
This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are first segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning is proposed to retrieve the most significant scenes for a textual query. A ranking strategy based on deep features is finally used to tackle the problem of visualizing the best thumbnail. Qualitative and quantitative experiments are conducted on a collection of edited videos to demonstrate the effectiveness of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
