TL;DR
This paper introduces a new global instance tracking task and benchmark to better evaluate and develop trackers that mimic human visual tracking abilities, addressing limitations of existing methods in challenging scenarios.
Contribution
It proposes the GIT task, constructs the VideoCube benchmark, and develops a scientific evaluation procedure based on human capabilities to assess tracking intelligence.
Findings
Significant gap between current trackers and human performance.
Established a challenging benchmark for tracking in complex scenarios.
Provided an online platform with tools and leaderboard for evaluation.
Abstract
Target tracking, the essential ability of the human visual system, has been simulated by computer vision tasks. However, existing trackers perform well in austere experimental environments but fail in challenges like occlusion and fast motion. The massive gap indicates that researches only measure tracking performance rather than intelligence. How to scientifically judge the intelligence level of trackers? Distinct from decision-making problems, lacking three requirements (a challenging task, a fair environment, and a scientific evaluation procedure) makes it strenuous to answer the question. In this article, we first propose the global instance tracking (GIT) task, which is supposed to search an arbitrary user-specified instance in a video without any assumptions about camera or motion consistency, to model the human visual tracking ability. Whereafter, we construct a high-quality and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
