Video Summarization through Reinforcement Learning with a 3D   Spatio-Temporal U-Net

Tianrui Liu; Qingjie Meng; Jun-Jie Huang; Athanasios Vlontzos; Daniel; Rueckert; Bernhard Kainz

arXiv:2106.10528·cs.CV·February 16, 2022

Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net

Tianrui Liu, Qingjie Meng, Jun-Jie Huang, Athanasios Vlontzos, Daniel, Rueckert, Bernhard Kainz

PDF

TL;DR

This paper presents a novel reinforcement learning framework using a 3D spatio-temporal U-Net for efficient video summarization, capable of operating in supervised and unsupervised modes, and applicable to medical videos.

Contribution

Introduction of the 3DST-UNet-RL framework that combines 3D CNN encoding with reinforcement learning for improved video summarization performance.

Findings

01

Effective in both supervised and unsupervised modes.

02

Outperforms baseline methods on standard benchmarks.

03

Applicable to medical video summarization, reducing storage and improving review efficiency.

Abstract

Intelligent video summarization algorithms allow to quickly convey the most relevant information in videos through the identification of the most essential and explanatory content while removing redundant video frames. In this paper, we introduce the 3DST-UNet-RL framework for video summarization. A 3D spatio-temporal U-Net is used to efficiently encode spatio-temporal information of the input videos for downstream reinforcement learning (RL). An RL agent learns from spatio-temporal latent scores and predicts actions for keeping or rejecting a video frame in a video summary. We investigate if real/inflated 3D spatio-temporal CNN features are better suited to learn representations from videos than commonly used 2D image features. Our framework can operate in both, a fully unsupervised mode and a supervised training mode. We analyse the impact of prescribed summary lengths and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · Max Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · U-Net