TDIOT: Target-driven Inference for Deep Video Object Tracking

Filiz Gurkan; Llukman Cerkezi; Ozgun Cirakman; Bilge Gunsel

arXiv:2103.11017·cs.CV·October 4, 2021

TDIOT: Target-driven Inference for Deep Video Object Tracking

Filiz Gurkan, Llukman Cerkezi, Ozgun Cirakman, Bilge Gunsel

PDF

1 Repo

TL;DR

TDIOT introduces a novel inference architecture that combines detection and tracking using a pre-trained Mask R-CNN, incorporating appearance similarity, local search, scale adaptation, and verification for improved video object tracking.

Contribution

The paper presents a new inference architecture that enhances deep video object tracking by integrating detection with tracking components without additional training.

Findings

01

Outperforms state-of-the-art short-term trackers in accuracy.

02

Provides comparable long-term tracking performance.

03

Effective handling of scale changes and tracking discontinuities.

Abstract

Recent tracking-by-detection approaches use deep object detectors as target detection baseline, because of their high performance on still images. For effective video object tracking, object detection is integrated with a data association step performed by either a custom design inference architecture or an end-to-end joint training for tracking purpose. In this work, we adopt the former approach and use the pre-trained Mask R-CNN deep object detector as the baseline. We introduce a novel inference architecture placed on top of FPN-ResNet101 backbone of Mask R-CNN to jointly perform detection and tracking, without requiring additional training for tracking purpose. The proposed single object tracker, TDIOT, applies an appearance similarity-based temporal matching for data association. In order to tackle tracking discontinuities, we incorporate a local search and matching module into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msprITU/TDIOT
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRegion Proposal Network · Softmax · RoIAlign · Convolution · Mask R-CNN