Video Summarization using Denoising Diffusion Probabilistic Model
Zirui Shang, Yubo Zhu, Hongxi Li, Shuo Yang, Xinxiao Wu

TL;DR
This paper introduces a novel generative video summarization method using Denoising Diffusion Probabilistic Models, which reduces annotation noise impact and improves generalization over traditional discriminative approaches.
Contribution
The paper proposes a diffusion-based generative framework for video summarization that is more robust to subjective annotation noise and overfitting, with an unsupervised pretraining strategy.
Findings
Outperforms existing methods on TVSum, SumMe, and FPVSum datasets.
Demonstrates robustness to annotation noise and limited data.
Achieves strong generalization in video summarization tasks.
Abstract
Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
MethodsDiffusion
