Learning Video Object Segmentation with Visual Memory

Pavel Tokmakov; Karteek Alahari; Cordelia Schmid

arXiv:1704.05737·cs.CV·July 13, 2017·46 cites

Learning Video Object Segmentation with Visual Memory

Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

PDF

Open Access 1 Video

TL;DR

This paper presents a novel two-stream neural network with a visual memory module for segmenting moving objects in unconstrained videos, achieving state-of-the-art results on benchmark datasets.

Contribution

Introduces a two-stream neural network with a convolutional recurrent memory module for video object segmentation, capturing object evolution without manual annotations.

Findings

01

Outperforms previous methods by nearly 6% on DAVIS dataset

02

Effective encoding of spatial and temporal features with visual memory

03

Extensive ablation confirms component contributions

Abstract

This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Video Object Segmentation with Visual Memory· youtube

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques