Masked Autoencoder for Unsupervised Video Summarization

Minho Shim; Taeoh Kim; Jinhyung Kim; Dongyoon Wee

arXiv:2306.01395·cs.CV·June 5, 2023·1 cites

Masked Autoencoder for Unsupervised Video Summarization

Minho Shim, Taeoh Kim, Jinhyung Kim, Dongyoon Wee

PDF

Open Access

TL;DR

This paper introduces an unsupervised autoencoder approach that leverages self-supervised learning and reconstruction scores to effectively perform dense video summarization without additional architecture modifications or fine-tuning.

Contribution

It demonstrates that a self-supervised autoencoder can be directly used for video summarization by utilizing reconstruction scores, eliminating the need for extra downstream design.

Findings

01

Effective in major unsupervised video summarization benchmarks

02

No additional architecture or fine-tuning required

03

Utilizes reconstruction scores for importance estimation

Abstract

Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks, but the video SSL has not shown its value for dense understanding tasks like video summarization. We claim an unsupervised autoencoder with sufficient self-supervised learning does not need any extra downstream architecture design or fine-tuning weights to be utilized as a video summarization model. The proposed method to evaluate the importance score of each frame takes advantage of the reconstruction score of the autoencoder's decoder. We evaluate the method in major unsupervised video summarization benchmarks to show its effectiveness under various experimental settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Natural Language Processing Techniques