MOT FCG++: Enhanced Representation of Spatio-temporal Motion and Appearance Features
Yanzhao Fang

TL;DR
This paper introduces MOT FCG++, a novel multi-object tracking method that enhances spatial-temporal motion and appearance feature representations, leading to improved tracking accuracy and robustness across multiple datasets.
Contribution
It proposes Diagonal Modulated GIoU and Mean Constant Velocity Modeling for better motion representation, and a dynamic appearance feature that incorporates confidence, advancing the state-of-the-art in MOT.
Findings
Achieved 63.1 HOTA on MOT17 test set.
Improved MOTA and IDF1 scores over baseline.
Performed competitively on MOT20 and DanceTrack datasets.
Abstract
The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering association method MOT FCG. For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. Second, Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
