Multi-Object Tracking as Attention Mechanism
Hiroshi Fukui, Taiki Miyagawa, Yusuke Morishita

TL;DR
This paper introduces TicrossNet, a fast, end-to-end multi-object tracking model that uses a simple cross-attention mechanism, achieving real-time performance without complex modules like Kalman filters or graph networks.
Contribution
The paper presents TicrossNet, a novel MOT model that is simpler, faster, and more robust to the number of instances, eliminating the need for traditional tracking modules.
Findings
Achieves 32.6 FPS on MOT17 and 31.0 FPS on MOT20.
Runs in real-time with over 100 instances per frame.
Robust to the number of tracked objects without changing detector size.
Abstract
We propose a conceptually simple and thus fast multi-object tracking (MOT) model that does not require any attached modules, such as the Kalman filter, Hungarian algorithm, transformer blocks, or graph networks. Conventional MOT models are built upon the multi-step modules listed above, and thus the computational cost is high. Our proposed end-to-end MOT model, \textit{TicrossNet}, is composed of a base detector and a cross-attention module only. As a result, the overhead of tracking does not increase significantly even when the number of instances () increases. We show that TicrossNet runs \textit{in real-time}; specifically, it achieves 32.6 FPS on MOT17 and 31.0 FPS on MOT20 (Tesla V100), which includes as many as 100 instances per frame. We also demonstrate that TicrossNet is robust to ; thus, it does not have to change the size of the base detector, depending on ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Chemical Sensor Technologies · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Concatenated Skip Connection · Balanced Selection
