# MAIN: Multi-Attention Instance Network for Video Segmentation

**Authors:** Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret,, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

arXiv: 1904.05847 · 2019-04-12

## TL;DR

MAIN introduces a novel multi-attention network for video segmentation that effectively integrates generic spatio-temporal cues, achieving state-of-the-art results without relying on sequence-specific modeling.

## Contribution

The paper presents MAIN, a multi-attention network that segments multiple video instances in real-time using generic cues, avoiding sequence-specific modeling.

## Key findings

- Achieves state-of-the-art on Youtube-VOS dataset.
- Improves unseen Jaccard by 6.8%.
- Operates at 30.3 FPS.

## Abstract

Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Network (MAIN), overcomes challenging segmentation scenarios over arbitrary videos without modelling sequence- or instance-specific knowledge. We design MAIN to segment multiple instances in a single forward pass, and optimize it with a novel loss function that favors class agnostic predictions and assigns instance-specific penalties. We achieve state-of-the-art performance on the challenging Youtube-VOS dataset and benchmark, improving the unseen Jaccard and F-Metric by 6.8% and 12.7% respectively, while operating at real-time (30.3 FPS).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.05847/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/1904.05847/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/1904.05847/full.md

---
Source: https://tomesphere.com/paper/1904.05847