Learning Policies for Adaptive Tracking with Deep Feature Cascades
Chen Huang, Simon Lucey, Deva Ramanan

TL;DR
This paper introduces an adaptive deep tracking method that dynamically chooses between cheap and expensive features for each frame, significantly boosting speed while maintaining high accuracy.
Contribution
It formulates adaptive tracking as a decision-making process and trains an agent via reinforcement learning to select features, enabling near real-time performance on CPU.
Findings
Achieves 23 fps on CPU with state-of-the-art accuracy.
Provides a 100x speedup for nearly half of the frames.
Demonstrates the effectiveness of adaptive feature selection in tracking.
Abstract
Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features. We formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network. This significantly reduces the feed-forward cost for easy frames with distinct or slow-moving objects. We train the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Learning Policies for Adaptive Tracking with Deep Feature Cascades· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Impact of Light on Environment and Health
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
