# Unsupervised learning from video to detect foreground objects in single   images

**Authors:** Ioana Croitoru (1), Simion-Vlad Bogolin (1), Marius Leordeanu (1 and, 2) ((1) Institute of Mathematics of the Romanian Academy, (2) University, "Politehnica" of Bucharest)

arXiv: 1703.10901 · 2017-04-03

## TL;DR

This paper presents an unsupervised learning approach where a neural network is trained to detect foreground objects in single images by mimicking a video-based unsupervised discovery process, achieving state-of-the-art results efficiently.

## Contribution

The method trains a student network to perform single-image object detection by learning from a video-based teacher during training, enabling fast and accurate inference.

## Key findings

- Achieves state-of-the-art results on YouTube Objects and Object Discovery datasets.
- Significantly faster inference at test time compared to previous methods.
- Improves single-image foreground object detection through unsupervised video-based training.

## Abstract

Unsupervised learning from visual data is one of the most difficult challenges in computer vision, being a fundamental task for understanding how visual recognition works. From a practical point of view, learning from unsupervised visual input has an immense practical value, as very large quantities of unlabeled videos can be collected at low cost. In this paper, we address the task of unsupervised learning to detect and segment foreground objects in single images. We achieve our goal by training a student pathway, consisting of a deep neural network. It learns to predict from a single input image (a video frame) the output for that particular frame, of a teacher pathway that performs unsupervised object discovery in video. Our approach is different from the published literature that performs unsupervised discovery in videos or in collections of images at test time. We move the unsupervised discovery phase during the training stage, while at test time we apply the standard feed-forward processing along the student pathway. This has a dual benefit: firstly, it allows in principle unlimited possibilities of learning and generalization during training, while remaining very fast at testing. Secondly, the student not only becomes able to detect in single images significantly better than its unsupervised video discovery teacher, but it also achieves state of the art results on two important current benchmarks, YouTube Objects and Object Discovery datasets. Moreover, at test time, our system is at least two orders of magnitude faster than other previous methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.10901/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1703.10901/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1703.10901/full.md

---
Source: https://tomesphere.com/paper/1703.10901