Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking
Futian Wang, Fengxiang Liu, Xiao Wang

TL;DR
This paper introduces a novel multi-object tracking method that uses adaptive key frame mining and intra-frame graph-based feature fusion to improve tracking accuracy, especially under occlusion conditions.
Contribution
It proposes a reinforcement learning-based key frame extraction and a graph convolutional network for intra-frame feature fusion, enhancing object association and occlusion handling in tracking.
Findings
Achieves 68.6 HOTA on MOT17 dataset
Improves IDF1 to 81.0, reducing ID switches
Outperforms existing methods in occlusion scenarios
Abstract
In the realm of multi-object tracking, the challenge of accurately capturing the spatial and temporal relationships between objects in video sequences remains a significant hurdle. This is further complicated by frequent occurrences of mutual occlusions among objects, which can lead to tracking errors and reduced performance in existing methods. Motivated by these challenges, we propose a novel adaptive key frame mining strategy that addresses the limitations of current tracking approaches. Specifically, we introduce a Key Frame Extraction (KFE) module that leverages reinforcement learning to adaptively segment videos, thereby guiding the tracker to exploit the intrinsic logic of the video content. This approach allows us to capture structured spatial relationships between different objects as well as the temporal relationships of objects across frames. To tackle the issue of object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Fire Detection and Safety Systems · IoT-based Smart Home Systems
MethodsFocus
