SOI is the Root of All Evil: Quantifying and Breaking Similar Object Interference in Single Object Tracking
Yipei Wang, Shiyu Hu, Shukun Jia, Panxi Xu, Hongfei Ma, Yiping Ma, Jing Zhang, Xiaobo Lu, Xin Zhao

TL;DR
This paper systematically investigates Similar Object Interference (SOI) in Single Object Tracking, demonstrating its impact on performance, and introduces a benchmark and a new paradigm using vision-language models to improve tracking robustness.
Contribution
It is the first to quantify SOI's impact, create SOIBench for semantic guidance evaluation, and propose VLM-based external guidance to enhance tracking performance.
Findings
Eliminating interference sources improves tracking accuracy significantly.
Existing vision-language tracking methods underperform with semantic guidance.
VLM-based external guidance boosts tracking accuracy up to 0.93 AUC.
Abstract
In this paper, we present the first systematic investigation and quantification of Similar Object Interference (SOI), a long-overlooked yet critical bottleneck in Single Object Tracking (SOT). Through controlled Online Interference Masking (OIM) experiments, we quantitatively demonstrate that eliminating interference sources leads to substantial performance improvements (AUC gains up to 4.35) across all SOTA trackers, directly validating SOI as a primary constraint for robust tracking and highlighting the feasibility of external cognitive guidance. Building upon these insights, we adopt natural language as a practical form of external guidance, and construct SOIBench-the first semantic cognitive guidance benchmark specifically targeting SOI challenges. It automatically mines SOI frames through multi-tracker collective judgment and introduces a multi-level annotation protocol to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
