Is a Pure Transformer Effective for Separated and Online Multi-Object Tracking?
Chongwei Liu, Haojie Li, Zhihui Wang, Rui Xu

TL;DR
This paper introduces a novel Pure Transformer (PuTR) model for online multi-object tracking, effectively unifying short- and long-term association tasks with improved domain adaptation and practical efficiency.
Contribution
It proposes representing trajectory graphs as directed acyclic graphs and leveraging Transformer attention mechanisms, creating a new unified approach for online MOT.
Findings
PuTR outperforms existing methods on multiple datasets.
The approach demonstrates superior domain adaptation.
Efficient training and inference enable practical deployment.
Abstract
Recent advances in Multi-Object Tracking (MOT) have demonstrated significant success in short-term association within the separated tracking-by-detection online paradigm. However, long-term tracking remains challenging. While graph-based approaches address this by modeling trajectories as global graphs, these methods are unsuitable for real-time applications due to their non-online nature. In this paper, we review the concept of trajectory graphs and propose a novel perspective by representing them as directed acyclic graphs. This representation can be described using frame-ordered object sequences and binary adjacency matrices. We observe that this structure naturally aligns with Transformer attention mechanisms, enabling us to model the association problem using a classic Transformer architecture. Based on this insight, we introduce a concise Pure Transformer (PuTR) to validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout
