Learning Pixel-Level Distinctions for Video Highlight Detection
Fanyue Wei, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan

TL;DR
This paper introduces a pixel-level distinction learning approach for video highlight detection, leveraging 3D CNNs and saliency to better model content relevance and improve detection accuracy.
Contribution
It proposes a novel encoder-decoder network that explicitly models pixel-level distinctions using temporal and spatial context for enhanced highlight detection.
Findings
Achieves state-of-the-art results on three benchmarks.
Effectively models temporal and spatial relations in videos.
Provides interpretable content distinctions for highlights.
Abstract
The goal of video highlight detection is to select the most attractive segments from a long video to depict the most interesting parts of the video. Existing methods typically focus on modeling relationship between different video segments in order to learning a model that can assign highlight scores to these segments; however, these approaches do not explicitly consider the contextual dependency within individual segments. To this end, we propose to learn pixel-level distinctions to improve the video highlight detection. This pixel-level distinction indicates whether or not each pixel in one video belongs to an interesting section. The advantages of modeling such fine-level distinctions are two-fold. First, it allows us to exploit the temporal and spatial relations of the content in one video, since the distinction of a pixel in one frame is highly dependent on both the content before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Visual Attention and Saliency Detection · Image and Video Quality Assessment
