FusionSeg: Learning to combine motion and appearance for fully automatic   segmention of generic objects in videos

Suyog Dutt Jain; Bo Xiong; Kristen Grauman

arXiv:1701.05384·cs.CV·April 13, 2017·151 cites

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

Suyog Dutt Jain, Bo Xiong, Kristen Grauman

PDF

Open Access

TL;DR

FusionSeg introduces an end-to-end neural network that combines motion and appearance cues to automatically segment generic objects in videos, leveraging weakly annotated data for training and achieving state-of-the-art results.

Contribution

It presents a novel two-stream fully convolutional network that fuses motion and appearance for video segmentation, trained with weak supervision from videos and image datasets.

Findings

01

Significant improvement over previous methods on three benchmarks.

02

Effective use of weakly annotated videos for training.

03

Achieves state-of-the-art segmentation of unseen objects.

Abstract

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework. Since large-scale video datasets with pixel level segmentations are problematic, we show how to bootstrap weakly annotated videos together with existing image recognition datasets for training. Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art for segmenting generic (unseen) objects. Code and pre-trained models are available on the project website.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Face recognition and analysis