Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks
Jae Shin Yoon, Francois Rameau, Junsik Kim, Seokju Lee, Seunghak Shin,, In So Kweon

TL;DR
This paper introduces a CNN-based pixel-level matching method for video object segmentation that combines multi-layer features and a feature compression technique, achieving high accuracy, speed, and domain transferability.
Contribution
The paper presents a novel CNN architecture with feature compression and two-stage training for robust, category-agnostic video object segmentation at the pixel level.
Findings
Outperforms related methods in accuracy, speed, and stability.
Effective in domain transfer, including infrared data.
Handles arbitrary target objects regardless of category.
Abstract
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Two-stage training (pre-training and fine-tuning) allows our network to handle any target object regardless of its category (even if the object's type does not belong to the pre-training data) or of variations in its appearance through a video sequence. Experiments on large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
