Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network
Guoqiang Liang, Yanbing Lv, Shucheng Li, Shizhou Zhang, Yanning Zhang

TL;DR
This paper introduces an unsupervised convolutional attentive adversarial network for video summarization, effectively predicting key frames without requiring human annotations, and outperforms existing methods on benchmark datasets.
Contribution
Proposes a novel unsupervised deep learning framework using adversarial training with attention mechanisms for video summarization.
Findings
Outperforms state-of-the-art unsupervised methods on SumMe and TVSum datasets.
Achieves comparable or better results than some supervised approaches.
Demonstrates effectiveness of attention-based importance scoring in video summarization.
Abstract
With the explosive growth of video data, video summarization, which attempts to seek the minimum subset of frames while still conveying the main story, has become one of the hottest topics. Nowadays, substantial achievements have been made by supervised learning techniques, especially after the emergence of deep learning. However, it is extremely expensive and difficult to collect human annotation for large-scale video datasets. To address this problem, we propose a convolutional attentive adversarial network (CAAN), whose key idea is to build a deep summarizer in an unsupervised way. Upon the generative adversarial network, our overall framework consists of a generator and a discriminator. The former predicts importance scores for all frames of a video while the latter tries to distinguish the score-weighted frame features from original frame features. Specifically, the generator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Digital Media Forensic Detection
