Siamese-DETR for Generic Multi-Object Tracking
Qiankun Liu, Yichen Li, Yuqi Jiang, Ying Fu

TL;DR
Siamese-DETR introduces a simple, training-efficient approach for generic multi-object tracking that leverages object queries in DETR, eliminating complex data association and surpassing existing methods on benchmark datasets.
Contribution
The paper proposes Siamese-DETR, a novel GMOT method using object queries in DETR, trained on detection datasets, simplifying tracking and improving performance.
Findings
Outperforms existing GMOT methods on GMOT-40 dataset
Uses only detection datasets like COCO for training
Simplifies online tracking with query-based approach
Abstract
The ability to detect and track the dynamic objects in different scenes is fundamental to real-world applications, e.g., autonomous driving and robot navigation. However, traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories. Recently, Open-Vocabulary MOT (OVMOT) and Generic MOT (GMOT) are proposed to track interested objects beyond pre-defined categories with the given text prompt and template image. However, the expensive well pre-trained (vision-)language model and fine-grained category annotations are required to train OVMOT models. In this paper, we focus on GMOT and propose a simple but effective method, Siamese-DETR, for GMOT. Only the commonly used detection datasets (e.g., COCO) are required for training. Different from existing GMOT methods, which train a Single Object Tracking (SOT) based detector to detect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Residual Connection · Byte Pair Encoding · Softmax · Dropout · Adam · Position-Wise Feed-Forward Layer
