Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward
Kaiyang Zhou, Yu Qiao, Tao Xiang

TL;DR
This paper introduces an unsupervised deep reinforcement learning approach for video summarization that optimizes diversity and representativeness without labels, outperforming many existing methods.
Contribution
The paper presents a novel unsupervised reinforcement learning framework with a custom reward function for diversity and representativeness in video summarization.
Findings
Outperforms state-of-the-art unsupervised methods
Comparable or superior to supervised approaches
Effective in producing diverse and representative summaries
Abstract
Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos. In this paper, we formulate video summarization as a sequential decision-making process and develop a deep summarization network (DSN) to summarize videos. DSN predicts for each video frame a probability, which indicates how likely a frame is selected, and then takes actions based on the probability distributions to select frames, forming video summaries. To train our DSN, we propose an end-to-end, reinforcement learning-based framework, where we design a novel reward function that jointly accounts for diversity and representativeness of generated summaries and does not rely on labels or user interactions at all. During training, the reward function judges how diverse and representative the generated summaries are, while DSN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Human Pose and Action Recognition
