Make One-Shot Video Object Segmentation Efficient Again

Tim Meinhardt; Laura Leal-Taixe

arXiv:2012.01866·cs.CV·December 4, 2020·5 cites

Make One-Shot Video Object Segmentation Efficient Again

Tim Meinhardt, Laura Leal-Taixe

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces e-OSVOS, an efficient one-shot video object segmentation method that decouples detection and segmentation, uses meta-learned initialization and learning rates, and applies online adaptation to improve speed and accuracy.

Contribution

It proposes a novel, efficient VOS approach that optimizes test-time performance through meta-learning and online adaptation, reducing runtime while maintaining state-of-the-art accuracy.

Findings

01

Achieves state-of-the-art results on DAVIS and YouTube-VOS datasets.

02

Significantly reduces test runtime compared to previous methods.

03

Maintains high segmentation accuracy with online model adaptation.

Abstract

Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible. To mitigate the inefficiencies of previous fine-tuning approaches, we present efficient One-Shot Video Object Segmentation (e-OSVOS). In contrast to most VOS approaches, e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN. The one-shot test runtime and performance are optimized without a laborious and handcrafted hyperparameter search. To this end, we meta learn the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Make One-Shot Video Object Segmentation Efficient Again· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning

MethodsRegion Proposal Network · VOS · Softmax · Convolution · RoIAlign · Mask R-CNN