'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-term Tracking
Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang

TL;DR
This paper introduces a real-time long-term tracking framework using skimming and perusal modules, effectively handling object presence and absence with high accuracy and speed on standard benchmarks.
Contribution
The paper proposes a novel long-term tracking framework combining skimming and perusal modules, improving robustness and efficiency over previous methods.
Findings
Achieves state-of-the-art performance on VOT-2018 and OxUvA benchmarks.
Operates in real-time while maintaining high accuracy.
Effectively handles object disappearance and reappearance.
Abstract
Compared with traditional short-term tracking, long-term tracking poses more challenges and is much closer to realistic applications. However, few works have been done and their performance have also been limited. In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules. The perusal module consists of an effective bounding box regressor to generate a series of candidate proposals and a robust target verifier to infer the optimal candidate with its confidence score. Based on this score, our tracker determines whether the tracked object being present or absent, and then chooses the tracking strategies of local search or global search respectively in the next frame. To speed up the image-wide global search, a novel skimming module is designed to efficiently choose the most possible regions from a large number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Fire Detection and Safety Systems · Image Enhancement Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
