EDSNet: Efficient-DSNet for Video Summarization
Ashish Prasad, Pranav Jeevan, Amit Sethi

TL;DR
This paper introduces EDSNet, an efficient video summarization model that replaces transformer attention with resource-friendly mechanisms like Fourier and Wavelet transforms, reducing computational costs while maintaining performance.
Contribution
The work presents a novel, resource-efficient architecture for video summarization that outperforms transformer-based methods in efficiency and scalability.
Findings
Significant reduction in computational costs.
Maintains competitive summarization performance.
Effective use of Fourier and Wavelet transforms for token mixing.
Abstract
Current video summarization methods largely rely on transformer-based architectures, which, due to their quadratic complexity, require substantial computational resources. In this work, we address these inefficiencies by enhancing the Direct-to-Summarize Network (DSNet) with more resource-efficient token mixing mechanisms. We show that replacing traditional attention with alternatives like Fourier, Wavelet transforms, and Nystr\"omformer improves efficiency and performance. Furthermore, we explore various pooling strategies within the Regional Proposal Network, including ROI pooling, Fast Fourier Transform pooling, and flat pooling. Our experimental results on TVSum and SumMe datasets demonstrate that these modifications significantly reduce computational costs while maintaining competitive summarization performance. Thus, our work offers a more scalable solution for video summarization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need
