# Unsupervised object segmentation in video by efficient selection of   highly probable positive features

**Authors:** Emanuela Haller, Marius Leordeanu

arXiv: 1704.05674 · 2017-04-20

## TL;DR

This paper introduces an efficient unsupervised video object segmentation method that automatically selects positive features based on spatio-temporal, appearance, and motion cues, achieving state-of-the-art results with theoretical guarantees.

## Contribution

The paper presents a novel unsupervised segmentation approach that combines feature selection, a two-stage pixel and descriptor analysis, and theoretical guarantees for learning discriminative classifiers.

## Key findings

- Achieves state-of-the-art results on Youtube-Objects and SegTrack datasets.
- At least one order of magnitude faster than competing methods.
- Provides theoretical guarantees for unsupervised discriminative learning.

## Abstract

We address an essential problem in computer vision, that of unsupervised object segmentation in video, where a main object of interest in a video sequence should be automatically separated from its background. An efficient solution to this task would enable large-scale video interpretation at a high semantic level in the absence of the costly manually labeled ground truth. We propose an efficient unsupervised method for generating foreground object soft-segmentation masks based on automatic selection and learning from highly probable positive features. We show that such features can be selected efficiently by taking into consideration the spatio-temporal, appearance and motion consistency of the object during the whole observed sequence. We also emphasize the role of the contrasting properties between the foreground object and its background. Our model is created in two stages: we start from pixel level analysis, on top of which we add a regression model trained on a descriptor that considers information over groups of pixels and is both discriminative and invariant to many changes that the object undergoes throughout the video. We also present theoretical properties of our unsupervised learning method, that under some mild constraints is guaranteed to learn a correct discriminative classifier even in the unsupervised case. Our method achieves competitive and even state of the art results on the challenging Youtube-Objects and SegTrack datasets, while being at least one order of magnitude faster than the competition. We believe that the competitive performance of our method in practice, along with its theoretical properties, constitute an important step towards solving unsupervised discovery in video.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.05674/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1704.05674/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1704.05674/full.md

---
Source: https://tomesphere.com/paper/1704.05674