TL;DR
D3S introduces a novel single-shot segmentation tracker that combines two target models to improve robustness and accuracy, outperforming existing trackers on multiple benchmarks without dataset-specific finetuning.
Contribution
The paper presents D3S, a discriminative single-shot segmentation tracker that integrates two complementary models for enhanced robustness and accuracy in visual object tracking.
Findings
Outperforms all trackers on VOT2016, VOT2018, and GOT-10k benchmarks.
Achieves near state-of-the-art results on TrackingNet without finetuning.
Runs close to real-time, significantly faster than comparable segmentation trackers.
Abstract
Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
D3S – A Discriminative Single Shot Segmentation Tracker· youtube
