LIP: Learning Instance Propagation for Video Object Segmentation
Ye Lyu, George Vosselman, Gui-Song Xia, Michael Ying Yang

TL;DR
This paper introduces a novel end-to-end deep neural network combining Mask-RCNN and Conv-GRU for semi-supervised video object segmentation, effectively handling appearance changes, occlusions, and scale variations.
Contribution
It presents a unified model that integrates instance segmentation with a visual memory module for improved video object segmentation without post-processing.
Findings
Achieves competitive results on DAVIS 2016 and 2017 datasets.
Handles multiple objects and occlusions effectively.
No post-processing or synthetic data needed.
Abstract
In recent years, the task of segmenting foreground objects from background in a video, i.e. video object segmentation (VOS), has received considerable attention. In this paper, we propose a single end-to-end trainable deep neural network, convolutional gated recurrent Mask-RCNN, for tackling the semi-supervised VOS task. We take advantage of both the instance segmentation network (Mask-RCNN) and the visual memory module (Conv-GRU) to tackle the VOS task. The instance segmentation network predicts masks for instances, while the visual memory module learns to selectively propagate information for multiple instances simultaneously, which handles the appearance change, the variation of scale and pose and the occlusions between objects. After offline and online training under purely instance segmentation losses, our approach is able to achieve satisfactory results without any post-processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
