Understanding and Diagnosing Visual Tracking Systems
Naiyan Wang, Jianping Shi, Dit-Yan Yeung, Jiaya Jia

TL;DR
This paper introduces a modular framework for analyzing visual tracking systems by breaking them into five components, revealing the relative importance of each and providing insights that challenge common beliefs in the field.
Contribution
The paper proposes a novel component-based framework for understanding and diagnosing visual trackers through ablation studies, offering a new baseline and insights into component significance.
Findings
Feature extractor is the most critical component.
Observation model often yields minimal improvements.
Ensemble post-processor enhances performance with diverse trackers.
Abstract
Several benchmark datasets for visual tracking research have been proposed in recent years. Despite their usefulness, whether they are sufficient for understanding and diagnosing the strengths and weaknesses of different trackers remains questionable. To address this issue, we propose a framework by breaking a tracker down into five constituent parts, namely, motion model, feature extractor, observation model, model updater, and ensemble post-processor. We then conduct ablative experiments on each component to study how it affects the overall result. Surprisingly, our findings are discrepant with some common beliefs in the visual tracking research community. We find that the feature extractor plays the most important role in a tracker. On the other hand, although the observation model is the focus of many studies, we find that it often brings no significant improvement. Moreover, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Video Analysis and Summarization
