Two stages for visual object tracking
Fei Chen, Fuhan Zhang, Xiaodong Wang

TL;DR
This paper introduces a two-stage visual object tracking method combining detection via Siamese networks and segmentation for refined accuracy, achieving state-of-the-art results on multiple benchmarks.
Contribution
The novel two-stage tracker integrates detection and segmentation, improving accuracy over traditional Siamese-based trackers.
Findings
Achieves state-of-the-art EAO scores on VOT2016, VOT2018, and VOT2019 datasets.
Combines detection and segmentation for more precise object tracking.
Outperforms existing methods on multiple benchmarks.
Abstract
Siamese-based trackers have achived promising performance on visual object tracking tasks. Most existing Siamese-based trackers contain two separate branches for tracking, including classification branch and bounding box regression branch. In addition, image segmentation provides an alternative way to obetain the more accurate target region. In this paper, we propose a novel tracker with two-stages: detection and segmentation. The detection stage is capable of locating the target by Siamese networks. Then more accurate tracking results are obtained by segmentation module given the coarse state estimation in the first stage. We conduct experiments on four benchmarks. Our approach achieves state-of-the-art results, with the EAO of 52.6 on VOT2016, 51.3 on VOT2018, and 39.0 on VOT2019 datasets, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
