RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video   Object Segmentation

Youngeun Kim; Seokeon Choi; Hankyeol Lee; Taekyung Kim; Changick; Kim

arXiv:1909.13247·cs.CV·October 11, 2019

RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation

Youngeun Kim, Seokeon Choi, Hankyeol Lee, Taekyung Kim, Changick, Kim

PDF

Open Access

TL;DR

This paper introduces RPM-Net, a self-supervised deep network for video object segmentation that matches pixels between frames using deformable convolutions, achieving state-of-the-art results without labeled data.

Contribution

RPM-Net is a novel architecture that leverages deformable convolution for pixel matching in self-supervised video segmentation, reducing reliance on labeled datasets.

Findings

01

Achieves state-of-the-art performance on DAVIS-2017, SegTrack-v2, and Youtube-Objects datasets.

02

Significantly narrows the performance gap between self-supervised and fully-supervised methods.

03

Improves robustness to camera shake, fast motion, deformation, and occlusion.

Abstract

In this paper, we introduce a self-supervised approach for video object segmentation without human labeled data.Specifically, we present Robust Pixel-level Matching Net-works (RPM-Net), a novel deep architecture that matches pixels between adjacent frames, using only color information from unlabeled videos for training. Technically, RPM-Net can be separated in two main modules. The embed-ding module first projects input images into high dimensional embedding space. Then the matching module with deformable convolution layers matches pixels between reference and target frames based on the embedding features.Unlike previous methods using deformable convolution, our matching module adopts deformable convolution to focus on similar features in spatio-temporally neighboring pixels.Our experiments show that the selective feature sampling improves the robustness to challenging problems in video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsDeformable Convolution · Convolution