Semantic-Guided Unsupervised Video Summarization

Haizhou Liu; Haodong Jin; Yiming Wang; Hui Yu

arXiv:2601.14773·cs.AI·January 22, 2026

Semantic-Guided Unsupervised Video Summarization

Haizhou Liu, Haodong Jin, Yiming Wang, Hui Yu

PDF

Open Access

TL;DR

This paper introduces a semantic-guided unsupervised video summarization method that leverages semantic alignment and incremental training to improve keyframe selection and address GAN training instability.

Contribution

It proposes a novel semantic alignment attention mechanism and an incremental training strategy within an adversarial framework for better video summarization.

Findings

01

Achieves superior performance on benchmark datasets.

02

Effectively guides keyframe selection with semantic information.

03

Reduces training instability in GAN-based models.

Abstract

Video summarization is a crucial technique for social understanding, enabling efficient browsing of massive multimedia content and extraction of key information from social platforms. Most existing unsupervised summarization methods rely on Generative Adversarial Networks (GANs) to enhance keyframe selection and generate coherent, video summaries through adversarial training. However, such approaches primarily exploit unimodal features, overlooking the guiding role of semantic information in keyframe selection, and often suffer from unstable training. To address these limitations, we propose a novel Semantic-Guided Unsupervised Video Summarization method. Specifically, we design a novel frame-level semantic alignment attention mechanism and integrate it into a keyframe selector, which guides the Transformer-based generator within the adversarial framework to better reconstruct videos.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications