Fast Video Object Segmentation via Mask Transfer Network

Tao Zhuo; Zhiyong Cheng; Mohan Kankanhalli

arXiv:1908.10717·cs.CV·August 29, 2019·1 cites

Fast Video Object Segmentation via Mask Transfer Network

Tao Zhuo, Zhiyong Cheng, Mohan Kankanhalli

PDF

Open Access

TL;DR

This paper introduces a fast and efficient mask transfer network for video object segmentation that eliminates the need for fine-tuning and achieves real-time processing speeds while maintaining competitive accuracy.

Contribution

The proposed Mask Transfer Network (MTN) significantly improves VOS speed by using global pixel matching on downsampled features without fine-tuning or relying on temporal cues.

Findings

01

Achieves 37 fps on DAVIS datasets.

02

Maintains competitive accuracy with state-of-the-art methods.

03

Does not require fine-tuning or object category information.

Abstract

Accuracy and processing speed are two important factors that affect the use of video object segmentation (VOS) in real applications. With the advanced techniques of deep neural networks, the accuracy has been significantly improved, however, the speed is still far below the real-time needs because of the complicated network design, such as the requirement of the first frame fine-tuning step. To overcome this limitation, we propose a novel mask transfer network (MTN), which can greatly boost the processing speed of VOS and also achieve a reasonable accuracy. The basic idea of MTN is to transfer the reference mask to the target frame via an efficient global pixel matching strategy. The global pixel matching between the reference frame and the target frame is to ensure good matching results. To enhance the matching speed, we perform the matching on a downsampled feature map with 1/32 of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings