RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation
Youngeun Kim, Seokeon Choi, Hankyeol Lee, Taekyung Kim, Changick, Kim

TL;DR
This paper introduces RPM-Net, a self-supervised deep network for video object segmentation that matches pixels between frames using deformable convolutions, achieving state-of-the-art results without labeled data.
Contribution
RPM-Net is a novel architecture that leverages deformable convolution for pixel matching in self-supervised video segmentation, reducing reliance on labeled datasets.
Findings
Achieves state-of-the-art performance on DAVIS-2017, SegTrack-v2, and Youtube-Objects datasets.
Significantly narrows the performance gap between self-supervised and fully-supervised methods.
Improves robustness to camera shake, fast motion, deformation, and occlusion.
Abstract
In this paper, we introduce a self-supervised approach for video object segmentation without human labeled data.Specifically, we present Robust Pixel-level Matching Net-works (RPM-Net), a novel deep architecture that matches pixels between adjacent frames, using only color information from unlabeled videos for training. Technically, RPM-Net can be separated in two main modules. The embed-ding module first projects input images into high dimensional embedding space. Then the matching module with deformable convolution layers matches pixels between reference and target frames based on the embedding features.Unlike previous methods using deformable convolution, our matching module adopts deformable convolution to focus on similar features in spatio-temporally neighboring pixels.Our experiments show that the selective feature sampling improves the robustness to challenging problems in video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsDeformable Convolution · Convolution
