Less is More: Learning Highlight Detection from Video Duration
Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman

TL;DR
This paper introduces an unsupervised highlight detection method that leverages video duration as an implicit signal, showing significant improvements over existing methods on public benchmarks.
Contribution
It proposes a novel ranking framework that uses video duration as supervision, enabling scalable highlight detection without manual annotations.
Findings
Outperforms state-of-the-art unsupervised highlight detection methods.
Successfully trained on 10 million hashtagged Instagram videos.
Demonstrates effectiveness on two public benchmarks.
Abstract
Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos. We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal. Our key insight is that video segments from shorter user-generated videos are more likely to be highlights than those from longer videos, since users tend to be more selective about the content when capturing shorter videos. Leveraging this insight, we introduce a novel ranking framework that prefers segments from shorter videos, while properly accounting for the inherent noise in the (unlabeled) training data. We use it to train a highlight detector with 10M hashtagged Instagram videos. In experiments on two challenging public video highlight detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Video Coding and Compression Technologies
