TL;DR
ReaMOT introduces a reasoning-based multi-object tracking task and benchmark, emphasizing logical inference over explicit visual-textual matching, and proposes a novel, training-free framework called ReaTrack that significantly improves high-level reasoning performance.
Contribution
The paper presents a new reasoning-based RMOT task, constructs a large-scale benchmark dataset, and proposes ReaTrack, a novel framework that enhances reasoning capabilities without training.
Findings
ReaTrack achieves over three times improvement in RHOTA on high-level reasoning scenarios.
The ReaMOT dataset includes 1,156 instructions and 423,359 image-language pairs across six evaluation scenarios.
ReaTrack outperforms traditional trackers and LVLM-based methods on the ReaMOT benchmark.
Abstract
Referring Multi-Object Tracking (RMOT) aims to track targets specified by language instructions. However, existing RMOT paradigms heavily rely on explicit visual-textual matching and consequently fail to generalize to complex instructions that require logical reasoning. To overcome this, we propose Reasoning-based Multi-Object Tracking (ReaMOT), a novel task that elevates tracking to a cognitive level, requiring models to infer and track specific targets satisfying implicit constraints via logical reasoning. To advance this field, we construct the ReaMOT Challenge, a comprehensive benchmark featuring a tailored metric suite and a large scale dataset. This dataset comprises 1,156 language instructions, 423,359 image language pairs, and 869 distinct video sequences systematically categorized into six distinct evaluation scenarios, with over 75\% of the instructions dedicated to High Level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
