Fast Online Object Tracking and Segmentation: A Unifying Approach

Qiang Wang; Li Zhang; Luca Bertinetto; Weiming Hu; Philip H.S. Torr

arXiv:1812.05050·cs.CV·May 7, 2019·98 cites

Fast Online Object Tracking and Segmentation: A Unifying Approach

Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H.S. Torr

PDF

Open Access 3 Repos

TL;DR

SiamMask is a real-time, unified approach for visual object tracking and semi-supervised video segmentation that achieves state-of-the-art performance and high speed by augmenting Siamese networks with segmentation capabilities.

Contribution

The paper introduces SiamMask, a simple yet effective method that unifies object tracking and segmentation with real-time performance, improving training with a segmentation loss.

Findings

01

Achieves 55 fps on VOT-2018 for tracking.

02

Sets new state-of-the-art among real-time trackers.

03

Demonstrates competitive segmentation performance on DAVIS datasets.

Abstract

In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state of the art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017. The project website is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings