Pixel-Level Matching for Video Object Segmentation using Convolutional   Neural Networks

Jae Shin Yoon; Francois Rameau; Junsik Kim; Seokju Lee; Seunghak Shin,; In So Kweon

arXiv:1708.05137·cs.CV·August 18, 2017

Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks

Jae Shin Yoon, Francois Rameau, Junsik Kim, Seokju Lee, Seunghak Shin,, In So Kweon

PDF

TL;DR

This paper introduces a CNN-based pixel-level matching method for video object segmentation that combines multi-layer features and a feature compression technique, achieving high accuracy, speed, and domain transferability.

Contribution

The paper presents a novel CNN architecture with feature compression and two-stage training for robust, category-agnostic video object segmentation at the pixel level.

Findings

01

Outperforms related methods in accuracy, speed, and stability.

02

Effective in domain transfer, including infrared data.

03

Handles arbitrary target objects regardless of category.

Abstract

We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Two-stage training (pre-training and fine-tuning) allows our network to handle any target object regardless of its category (even if the object's type does not belong to the pre-training data) or of variations in its appearance through a video sequence. Experiments on large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.