Reinforcement Learning for Unsupervised Video Summarization with Reward Generator Training

Mehryar Abbasi; Hadi Hadizadeh; Parvaneh Saeedi

arXiv:2407.04258·cs.MM·December 24, 2025

Reinforcement Learning for Unsupervised Video Summarization with Reward Generator Training

Mehryar Abbasi, Hadi Hadizadeh, Parvaneh Saeedi

PDF

TL;DR

This paper introduces an unsupervised video summarization method using reinforcement learning, where a reconstruction-based reward generator improves training stability and aligns summaries with human preferences.

Contribution

It proposes a novel RL-based framework with a self-supervised generator for reward calculation, overcoming adversarial training issues in video summarization.

Findings

01

Achieves high correlation with human judgments.

02

Demonstrates improved training stability.

03

Attains promising F-scores on benchmark datasets.

Abstract

This paper presents a novel approach for unsupervised video summarization using reinforcement learning (RL), addressing limitations like unstable adversarial training and reliance on heuristic-based reward functions. The method operates on the principle that reconstruction fidelity serves as a proxy for informativeness, correlating summary quality with reconstruction ability. The summarizer model assigns importance scores to frames to generate the final summary. For training, RL is coupled with a unique reward generation pipeline that incentivizes improved reconstructions. This pipeline uses a generator model to reconstruct the full video from the selected summary frames; the similarity between the original and reconstructed video provides the reward signal. The generator itself is pre-trained self-supervisedly to reconstruct randomly masked frames. This two-stage training process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.