Video Summarization with Attention-Based Encoder-Decoder Networks

Zhong Ji; Kailin Xiong; Yanwei Pang; Xuelong Li

arXiv:1708.09545·cs.CV·April 17, 2018·54 cites

Video Summarization with Attention-Based Encoder-Decoder Networks

Zhong Ji, Kailin Xiong, Yanwei Pang, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces AVS, an attention-based encoder-decoder neural network for supervised video summarization, which effectively learns to select keyshots by mimicking human summarization through sequence-to-sequence modeling.

Contribution

It proposes a novel AVS framework utilizing BiLSTM and attention mechanisms for improved video summarization performance.

Findings

01

AVS outperforms state-of-the-art methods on SumMe and TVSum datasets.

02

Achieves 0.8% to 3% improvement in keyshot selection accuracy.

03

Demonstrates the effectiveness of attention mechanisms in video summarization.

Abstract

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Advanced Image and Video Retrieval Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory