Two is a crowd: tracking relations in videos
Artem Moskalev, Ivan Sosnovik, Arnold Smeulders

TL;DR
This paper introduces a Relation Encoding Module (REM) that enhances multi-object tracking in crowded scenes by encoding inter-object relations, significantly improving tracking performance especially under occlusions.
Contribution
The paper presents a novel plug-in module that encodes relations between objects using message passing on spatio-temporal graphs, improving existing trackers in crowded scenarios.
Findings
Improved tracking accuracy on MOT17 and MOT20 datasets.
Enhanced occlusion handling through relational cues.
Baseline tracker performance increases with REM integration.
Abstract
Tracking multiple objects individually differs from tracking groups of related objects. When an object is a part of the group, its trajectory depends on the trajectories of the other group members. Most of the current state-of-the-art trackers follow the approach of tracking each object independently, with the mechanism to handle the overlapping trajectories where necessary. Such an approach does not take inter-object relations into account, which may cause unreliable tracking for the members of the groups, especially in crowded scenarios, where individual cues become unreliable due to occlusions. To overcome these limitations and to extend such trackers to crowded scenes, we propose a plug-in Relation Encoding Module (REM). REM encodes relations between tracked objects by running a message passing over a corresponding spatio-temporal graph, computing relation embeddings for the tracked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · Random Ensemble Mixture
