Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation
Yingjie Yin, De Xu, Xingang Wang, Lei Zhang

TL;DR
This paper introduces DDEAL, a fast and accurate semi-supervised video object segmentation method that avoids online fine-tuning by using directional deep embedding and appearance learning, achieving state-of-the-art results efficiently.
Contribution
The paper proposes a novel DDEAL approach with a global directional matching module and a directional appearance model, eliminating the need for online fine-tuning in VOS.
Findings
Achieves 74.8% J & F score on DAVIS 2017
Attains 71.3% G score on YouTube-VOS
Runs at 25 fps with high accuracy
Abstract
Most recent semi-supervised video object segmentation (VOS) methods rely on fine-tuning deep convolutional neural networks online using the given mask of the first frame or predicted masks of subsequent frames. However, the online fine-tuning process is usually time-consuming, limiting the practical use of such methods. We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. First, a global directional matching module, which can be efficiently implemented by parallel convolutional operations, is proposed to learn a semantic pixel-wise embedding as an internal guidance. Second, an effective directional appearance model based statistics is proposed to represent the target and background on a spherical embedding space for VOS. Equipped with the global directional matching module and the directional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Face recognition and analysis · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
