EDSNet: Efficient-DSNet for Video Summarization

Ashish Prasad; Pranav Jeevan; Amit Sethi

arXiv:2409.14724·cs.CV·September 24, 2024

EDSNet: Efficient-DSNet for Video Summarization

Ashish Prasad, Pranav Jeevan, Amit Sethi

PDF

Open Access

TL;DR

This paper introduces EDSNet, an efficient video summarization model that replaces transformer attention with resource-friendly mechanisms like Fourier and Wavelet transforms, reducing computational costs while maintaining performance.

Contribution

The work presents a novel, resource-efficient architecture for video summarization that outperforms transformer-based methods in efficiency and scalability.

Findings

01

Significant reduction in computational costs.

02

Maintains competitive summarization performance.

03

Effective use of Fourier and Wavelet transforms for token mixing.

Abstract

Current video summarization methods largely rely on transformer-based architectures, which, due to their quadratic complexity, require substantial computational resources. In this work, we address these inefficiencies by enhancing the Direct-to-Summarize Network (DSNet) with more resource-efficient token mixing mechanisms. We show that replacing traditional attention with alternatives like Fourier, Wavelet transforms, and Nystr\"omformer improves efficiency and performance. Furthermore, we explore various pooling strategies within the Regional Proposal Network, including ROI pooling, Fast Fourier Transform pooling, and flat pooling. Our experimental results on TVSum and SumMe datasets demonstrate that these modifications significantly reduce computational costs while maintaining competitive summarization performance. Thus, our work offers a more scalable solution for video summarization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need