FullTransNet: Full Transformer with Local-Global Attention for Video Summarization
Libin Lan, Lu Jiang, Tianshu Yu, Xiaojuan Liu, Zhongshi He

TL;DR
FullTransNet introduces a full transformer architecture with local-global sparse attention for video summarization, improving modeling of dependencies and efficiency over existing methods.
Contribution
It proposes a novel full transformer with local-global attention for video summarization, addressing limitations of previous CNN and encoder-only transformer approaches.
Findings
Achieves state-of-the-art F-scores on SumMe and TVSum datasets.
Reduces computational costs compared to traditional full attention models.
Outperforms second-best methods by small margins, confirming effectiveness.
Abstract
Video summarization aims to generate a compact, informative, and representative synopsis of raw videos, which is crucial for browsing, analyzing, and understanding video content. Dominant approaches in video summarization primarily rely on recurrent or convolutional neural networks, and more recently on encoder-only transformer architectures. However, these methods typically suffer from several limitations in parallelism, modeling long-range dependencies, and providing explicit generative capabilities. To address these issues, we propose a transformer-like architecture named FullTransNet with two-fold ideas. First, it uses a full transformer with an encoder-decoder structure as an alternative architecture for video summarization. As the full transformer is specifically designed for sequence transduction tasks, its direct application to video summarization is both intuitive and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need
