ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking

Sijia Chen; Yanqiu Yu; En Yu; Wenbing Tao

arXiv:2505.20381·cs.CV·May 12, 2026

ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking

Sijia Chen, Yanqiu Yu, En Yu, Wenbing Tao

PDF

1 Repo

TL;DR

ReaMOT introduces a reasoning-based multi-object tracking task and benchmark, emphasizing logical inference over explicit visual-textual matching, and proposes a novel, training-free framework called ReaTrack that significantly improves high-level reasoning performance.

Contribution

The paper presents a new reasoning-based RMOT task, constructs a large-scale benchmark dataset, and proposes ReaTrack, a novel framework that enhances reasoning capabilities without training.

Findings

01

ReaTrack achieves over three times improvement in RHOTA on high-level reasoning scenarios.

02

The ReaMOT dataset includes 1,156 instructions and 423,359 image-language pairs across six evaluation scenarios.

03

ReaTrack outperforms traditional trackers and LVLM-based methods on the ReaMOT benchmark.

Abstract

Referring Multi-Object Tracking (RMOT) aims to track targets specified by language instructions. However, existing RMOT paradigms heavily rely on explicit visual-textual matching and consequently fail to generalize to complex instructions that require logical reasoning. To overcome this, we propose Reasoning-based Multi-Object Tracking (ReaMOT), a novel task that elevates tracking to a cognitive level, requiring models to infer and track specific targets satisfying implicit constraints via logical reasoning. To advance this field, we construct the ReaMOT Challenge, a comprehensive benchmark featuring a tailored metric suite and a large scale dataset. This dataset comprises 1,156 language instructions, 423,359 image language pairs, and 869 distinct video sequences systematically categorized into six distinct evaluation scenarios, with over 75\% of the instructions dedicated to High Level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen-si-jia/ReaMOT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.