TL;DR
MeMoSORT introduces a real-time multi-object tracking method that uses memory-augmented neural networks and adaptive IoU to improve accuracy in complex scenarios with occlusions and motion mismatches.
Contribution
The paper presents MeMoSORT, a novel online MOT tracker combining memory-assisted Kalman filtering and motion-adaptive IoU for enhanced tracking performance.
Findings
Achieves state-of-the-art HOTA scores on DanceTrack and SportsMOT datasets.
Effectively reduces identity switches and target loss in occlusion scenarios.
Outperforms conventional methods in complex human-dominant tracking environments.
Abstract
Multi-object tracking (MOT) in human-dominant scenarios, which involves continuously tracking multiple people within video sequences, remains a significant challenge in computer vision due to targets' complex motion and severe occlusions. Conventional tracking-by-detection methods are fundamentally limited by their reliance on Kalman filter (KF) and rigid Intersection over Union (IoU)-based association. The motion model in KF often mismatches real-world object dynamics, causing filtering errors, while rigid association struggles under occlusions, leading to identity switches or target loss. To address these issues, we propose MeMoSORT, a simple, online, and real-time MOT tracker with two key innovations. First, the Memory-assisted Kalman filter (MeKF) uses memory-augmented neural networks to compensate for mismatches between assumed and actual object motion. Second, the Motion-adaptive…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper introduces MeMoSORT, a method designed to address the problem of multiple-object tracking (MOT). It presents two main innovations: (a) a Memory-assisted Kalman Filter (MeKF), which employs a memory-augmented neural network (LSTM-based) to bridge the gap between assumed and actual motion patterns; and (b) a Motion-adaptive IoU (MoIoU), which dynamically expands the matching region and integrates height similarity to reduce association errors. It shows SoTA results in the DanceTrack
There are several concerns regarding both novelty and practicality. First, the MeKF component is computationally expensive and difficult to train due to its reliance on LSTMs. Consequently, the paper omits evaluation on more challenging datasets such as MOT20, which significantly limits the strength of the experimental validation. Second, using height as a discriminative feature for association is questionable. Estimating reliable person height from uncalibrated cameras is inherently difficult,
- This paper is easy to understand. - The proposed method is easy to understand as it combines elementary techniques.
- The design of the proposed method is ad hoc - The design of the Expansion IoU and Height IoU techniques is ad hoc. - Furthermore, there is no hyperparameter study for the relevant parameters (M, N). - Insufficient experiments - Generally, in tracking methods, comprehensive comparisons using multiple metrics like IDF1 and MOTA are common. - Specifically, metrics include IDF1, IDP, IDR, Recall, Precision, FP, FN, IDs, FM, MOTA, IDt, IDa, IDm, etc. This paper evaluates only a very limi
- MeMoSORT achieves SOTA performance across multiple metrics, notably HOTA, on DanceTrack and SportsMOT. - Mo-IoU provides a significant advantage in handling severe occlusions. It jointly controls the expansion scale (EIoU) and height weighting (HIoU) using the Motion-Adaptive Technique (MAT), which uses normalized speeds to adapt parameters discretely. This adaptive parameter selection ensures more robust and accurate tracking than existing fixed-parameter IoU variants. - The overall presentat
- The entire motivation for the Memory-assisted Kalman Filter (MeKF) rests on overcoming the "fundamental limitations of the linear, first-order Markovian motion model". The authors specifically implement the standard KF using a constant velocity model for the state transition matrix F. While the paper visually demonstrates that complex human movements (e.g., phased switching, predictable back-and-forth patterns) violate the Markovian assumption, the justification for choosing the most simplisti
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
