Learning Video Object Segmentation with Visual Memory
Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

TL;DR
This paper presents a novel two-stream neural network with a visual memory module for segmenting moving objects in unconstrained videos, achieving state-of-the-art results on benchmark datasets.
Contribution
Introduces a two-stream neural network with a convolutional recurrent memory module for video object segmentation, capturing object evolution without manual annotations.
Findings
Outperforms previous methods by nearly 6% on DAVIS dataset
Effective encoding of spatial and temporal features with visual memory
Extensive ablation confirms component contributions
Abstract
This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Learning Video Object Segmentation with Visual Memory· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
