FullTransNet: Full Transformer with Local-Global Attention for Video Summarization

Libin Lan; Lu Jiang; Tianshu Yu; Xiaojuan Liu; Zhongshi He

arXiv:2501.00882·cs.CV·August 8, 2025·2 cites

FullTransNet: Full Transformer with Local-Global Attention for Video Summarization

Libin Lan, Lu Jiang, Tianshu Yu, Xiaojuan Liu, Zhongshi He

PDF

Open Access

TL;DR

FullTransNet introduces a full transformer architecture with local-global sparse attention for video summarization, improving modeling of dependencies and efficiency over existing methods.

Contribution

It proposes a novel full transformer with local-global attention for video summarization, addressing limitations of previous CNN and encoder-only transformer approaches.

Findings

01

Achieves state-of-the-art F-scores on SumMe and TVSum datasets.

02

Reduces computational costs compared to traditional full attention models.

03

Outperforms second-best methods by small margins, confirming effectiveness.

Abstract

Video summarization aims to generate a compact, informative, and representative synopsis of raw videos, which is crucial for browsing, analyzing, and understanding video content. Dominant approaches in video summarization primarily rely on recurrent or convolutional neural networks, and more recently on encoder-only transformer architectures. However, these methods typically suffer from several limitations in parallelism, modeling long-range dependencies, and providing explicit generative capabilities. To address these issues, we propose a transformer-like architecture named FullTransNet with two-fold ideas. First, it uses a full transformer with an encoder-decoder structure as an alternative architecture for video summarization. As the full transformer is specifically designed for sequence transduction tasks, its direct application to video summarization is both intuitive and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need