MeMOT: Multi-Object Tracking with Memory
Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu,, Stefano Soatto

TL;DR
MeMOT introduces a Transformer-based online multi-object tracking framework that leverages a large spatio-temporal memory to improve long-term object association and detection.
Contribution
It presents a novel memory-augmented Transformer architecture for integrated detection and data association in multi-object tracking.
Findings
Achieves competitive performance on MOT benchmarks.
Effectively links objects over long time spans.
Utilizes a unified framework for detection and association.
Abstract
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by preserving a large spatio-temporal memory to store the identity embeddings of the tracked objects, and by adaptively referencing and aggregating useful information from the memory as needed. Our model, called MeMOT, consists of three main modules that are all Transformer-based: 1) Hypothesis Generation that produce object proposals in the current video frame; 2) Memory Encoding that extracts the core information from the memory for each tracked object; and 3) Memory Decoding that solves the object detection and data association tasks simultaneously for multi-object tracking. When evaluated on widely adopted MOT benchmark datasets, MeMOT observes very competitive performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Fire Detection and Safety Systems · Advanced Neural Network Applications
