Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation
Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian

TL;DR
This paper introduces MAMP, a self-supervised method for video object segmentation that uses motion-aware mask propagation and frame reconstruction, achieving state-of-the-art results without annotations.
Contribution
The paper presents a novel self-supervised approach called MAMP that effectively handles fast motion and long-term matching in video segmentation without requiring labeled data.
Findings
MAMP outperforms existing self-supervised methods on DAVIS-2017 and YouTube-VOS datasets.
MAMP achieves performance comparable to supervised methods.
MAMP demonstrates strong generalization to unseen categories.
Abstract
We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to subsequent frames according to our proposed motion-aware spatio-temporal matching module to handle fast motion and long-term matching scenarios. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves state-of-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e., 4.2% higher mean J&F on DAVIS-2017 and 4.85% higher mean J&F on the unseen categories of YouTube-VOS than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
