Make One-Shot Video Object Segmentation Efficient Again
Tim Meinhardt, Laura Leal-Taixe

TL;DR
This paper introduces e-OSVOS, an efficient one-shot video object segmentation method that decouples detection and segmentation, uses meta-learned initialization and learning rates, and applies online adaptation to improve speed and accuracy.
Contribution
It proposes a novel, efficient VOS approach that optimizes test-time performance through meta-learning and online adaptation, reducing runtime while maintaining state-of-the-art accuracy.
Findings
Achieves state-of-the-art results on DAVIS and YouTube-VOS datasets.
Significantly reduces test runtime compared to previous methods.
Maintains high segmentation accuracy with online model adaptation.
Abstract
Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible. To mitigate the inefficiencies of previous fine-tuning approaches, we present efficient One-Shot Video Object Segmentation (e-OSVOS). In contrast to most VOS approaches, e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN. The one-shot test runtime and performance are optimized without a laborious and handcrafted hyperparameter search. To this end, we meta learn the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning
MethodsRegion Proposal Network · VOS · Softmax · Convolution · RoIAlign · Mask R-CNN
