Leveraging Tacit Information Embedded in CNN Layers for Visual Tracking
Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba

TL;DR
This paper introduces an adaptive CNN layer combination and style statistics extraction for visual tracking, significantly enhancing tracker performance by leveraging implicit information in CNN activations.
Contribution
It proposes a novel method to adaptively combine CNN layers and utilize style statistics from CNN activations for improved visual tracking.
Findings
Enhanced tracking accuracy with style similarity measures.
Improved localization and scale estimation.
Significant performance gains demonstrated in experiments.
Abstract
Different layers in CNNs provide not only different levels of abstraction for describing the objects in the input but also encode various implicit information about them. The activation patterns of different features contain valuable information about the stream of incoming images: spatial relations, temporal patterns, and co-occurrence of spatial and spatiotemporal (ST) features. The studies in visual tracking literature, so far, utilized only one of the CNN layers, a pre-fixed combination of them, or an ensemble of trackers built upon individual layers. In this study, we employ an adaptive combination of several CNN layers in a single DCF tracker to address variations of the target appearances and propose the use of style statistics on both spatial and temporal properties of the target, directly extracted from CNN layers for visual tracking. Experiments demonstrate that using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Face recognition and analysis
