Transferring Rich Feature Hierarchies for Robust Visual Tracking
Naiyan Wang, Siyi Li, Abhinav Gupta, Dit-Yan Yeung

TL;DR
This paper introduces a method for visual tracking that pre-trains a CNN offline to learn rich feature hierarchies and then fine-tunes it online, overcoming data scarcity issues and improving tracking robustness.
Contribution
It proposes transferring pre-trained CNN features to visual tracking and generating probability maps instead of class labels, enhancing tracking performance.
Findings
Substantial performance improvement over state-of-the-art trackers
Effective transfer of rich feature hierarchies for tracking
Successful online fine-tuning for appearance adaptation
Abstract
Convolutional neural network (CNN) models have demonstrated great success in various computer vision tasks including image classification and object detection. However, some equally important tasks such as visual tracking remain relatively unexplored. We believe that a major hurdle that hinders the application of CNN to visual tracking is the lack of properly labeled training data. While existing applications that liberate the power of CNN often need an enormous amount of training data in the order of millions, visual tracking applications typically have only one labeled example in the first frame of each video. We address this research issue here by pre-training a CNN offline and then transferring the rich feature hierarchies learned to online tracking. The CNN is also fine-tuned during online tracking to adapt to the appearance of the tracked target specified in the first video frame.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Image Enhancement Techniques · Human Pose and Action Recognition
