TDT: Teaching Detectors to Track without Fully Annotated Videos

Shuzhi Yu; Guanhang Wu; Chunhui Gu; Mohammed E. Fathy

arXiv:2205.05583·cs.CV·May 12, 2022

TDT: Teaching Detectors to Track without Fully Annotated Videos

Shuzhi Yu, Guanhang Wu, Chunhui Gu, Mohammed E. Fathy

PDF

Open Access

TL;DR

This paper introduces a data distillation method that enables one-stage multi-object tracking models to learn from partially annotated data, achieving comparable accuracy to two-stage models but with higher speed.

Contribution

It presents a novel approach combining teacher embedder-generated pseudo-labels with detection training, reducing annotation costs and maintaining high tracking performance.

Findings

01

One-stage tracker matches two-stage performance in quality.

02

The proposed method is three times faster than traditional two-stage approaches.

03

Achieves competitive results without fully annotated tracking data.

Abstract

Recently, one-stage trackers that use a joint model to predict both detections and appearance embeddings in one forward pass received much attention and achieved state-of-the-art results on the Multi-Object Tracking (MOT) benchmarks. However, their success depends on the availability of videos that are fully annotated with tracking data, which is expensive and hard to obtain. This can limit the model generalization. In comparison, the two-stage approach, which performs detection and embedding separately, is slower but easier to train as their data are easier to annotate. We propose to combine the best of the two worlds through a data distillation approach. Specifically, we use a teacher embedder, trained on Re-ID datasets, to generate pseudo appearance embedding labels for the detection datasets. Then, we use the augmented dataset to train a detector that is also capable of regressing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis