TL;DR
MaskProp is a novel video object segmentation method that extends Mask R-CNN with mask propagation, enabling accurate classification, segmentation, and tracking of objects across video frames, even under challenging conditions.
Contribution
The paper introduces MaskProp, a simple yet effective approach that adapts Mask R-CNN for video by adding mask propagation, achieving state-of-the-art results with less labeled data.
Findings
Achieves top accuracy on YouTube-VIS dataset.
Robust to motion blur and occlusions.
Uses significantly less labeled data than previous methods.
Abstract
We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip. This allows our system to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip. Clip-level instance tracks generated densely for each frame in the sequence are finally aggregated to produce video-level object instance segmentation and classification. Our experiments demonstrate that our clip-level instance segmentation makes our approach robust to motion blur and object occlusions in video. MaskProp achieves the best reported accuracy on the YouTube-VIS dataset, outperforming the ICCV 2019 video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation· youtube
Taxonomy
MethodsRegion Proposal Network · Softmax · Convolution · RoIAlign · Mask R-CNN
