MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
Ruopeng Gao, Limin Wang

TL;DR
MeMOTR introduces a long-term memory-augmented Transformer that enhances multi-object tracking by effectively modeling long-term temporal information, leading to significant improvements over existing methods.
Contribution
The paper presents MeMOTR, a novel Transformer-based model with long-term memory injection for improved target association in multi-object tracking.
Findings
Outperforms state-of-the-art on DanceTrack with 7.9% and 13.0% improvements in HOTA and AssA.
Achieves superior association performance on MOT17.
Generalizes well on BDD100K dataset.
Abstract
As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Fire Detection and Safety Systems · Air Quality Monitoring and Forecasting
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Linear Layer · Dropout · Adam · Label Smoothing · Absolute Position Encodings · Byte Pair Encoding
