Video Summarization by Learning from Unpaired Data
Mrigank Rochan, Yang Wang

TL;DR
This paper introduces a novel method for video summarization that learns from unpaired data using adversarial training and diversity constraints, reducing the need for costly labeled datasets.
Contribution
It proposes a new unpaired learning framework for video summarization that does not require aligned training data, leveraging adversarial and diversity constraints.
Findings
Outperforms existing methods on benchmark datasets
Effectively learns from unpaired raw and summary videos
Reduces reliance on expensive labeled training data
Abstract
We consider the problem of video summarization. Given an input raw video, the goal is to select a small subset of key frames from the input video to create a shorter summary video that best describes the content of the original video. Most of the current state-of-the-art video summarization approaches use supervised learning and require labeled training data. Each training instance consists of a raw input video and its ground truth summary video curated by human annotators. However, it is very expensive and difficult to create such labeled training examples. To address this limitation, we propose a novel formulation to learn video summarization from unpaired data. We present an approach that learns to generate optimal video summaries using a set of raw videos () and a set of summary videos (), where there exists no correspondence between and . We argue that this type of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
