Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in   Videos

Bin Zhao; Goutam Bhat; Martin Danelljan; Luc Van Gool; Radu Timofte

arXiv:2101.02196·cs.CV·January 7, 2021

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

Bin Zhao, Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

PDF

1 Repo

TL;DR

This paper presents a method to generate accurate video object segmentation masks from bounding box annotations by exploiting spatio-temporal consistencies, enabling weakly supervised training and improving generalization in video segmentation and tracking.

Contribution

It introduces a spatio-temporal aggregation module to mine consistencies across frames, allowing mask generation from bounding boxes for large-scale weakly supervised training.

Findings

01

Achieves state-of-the-art results in video object segmentation.

02

Improves generalization in tracking tasks.

03

Enables large-scale mask generation from bounding box annotations.

Abstract

Segmenting objects in videos is a fundamental computer vision task. The current deep learning based paradigm offers a powerful, but data-hungry solution. However, current datasets are limited by the cost and human effort of annotating object masks in videos. This effectively limits the performance and generalization capabilities of existing video segmentation methods. To address this issue, we explore weaker form of bounding box annotations. We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos. To this end, we propose a spatio-temporal aggregation module that effectively mines consistencies in the object and background appearance across multiple frames. We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks. We generate segmentation masks for large scale tracking datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

visionml/pytracking
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsVOS