# Summarizing Videos with Attention

**Authors:** Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy, Monekosso, Paolo Remagnino

arXiv: 1812.01969 · 2019-02-22

## TL;DR

This paper introduces a simple, efficient self-attention based method for supervised video summarization that outperforms existing complex models on standard benchmarks.

## Contribution

The authors propose a novel self-attention approach for video summarization that simplifies the model and improves performance over bi-directional recurrent network methods.

## Key findings

- Achieved state-of-the-art results on TvSum and SumMe benchmarks.
- Reduced computational complexity compared to BiLSTM-based methods.
- Performed sequence transformation with a single feed forward pass.

## Abstract

In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.01969/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1812.01969/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1812.01969/full.md

---
Source: https://tomesphere.com/paper/1812.01969