# SUSiNet: See, Understand and Summarize it

**Authors:** Petros Koutras, Petros Maragos

arXiv: 1812.00722 · 2019-04-16

## TL;DR

SUSiNet is a multi-task spatio-temporal neural network that jointly addresses saliency estimation, action recognition, and video summarization, achieving competitive results with reduced computational cost.

## Contribution

It introduces a unified, end-to-end trainable network that handles multiple video analysis tasks simultaneously with deep supervision from eye-tracking data.

## Key findings

- Performs on par or better than single-task methods across seven datasets.
- Reduces computational cost compared to independent networks.
- Effectively integrates diverse datasets for multi-task learning.

## Abstract

In this work we propose a multi-task spatio-temporal network, called SUSiNet, that can jointly tackle the spatio-temporal problems of saliency estimation, action recognition and video summarization. Our approach employs a single network that is jointly end-to-end trained for all tasks with multiple and diverse datasets related to the exploring tasks. The proposed network uses a unified architecture that includes global and task specific layer and produces multiple output types, i.e., saliency maps or classification labels, by employing the same video input. Moreover, one additional contribution is that the proposed network can be deeply supervised through an attention module that is related to human attention as it is expressed by eye-tracking data. From the extensive evaluation, on seven different datasets, we have observed that the multi-task network performs as well as the state-of-the-art single-task methods (or in some cases better), while it requires less computational budget than having one independent network per each task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.00722/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1812.00722/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/1812.00722/full.md

---
Source: https://tomesphere.com/paper/1812.00722