TL;DR
This paper introduces a continuous convolution filter learning framework for visual tracking that integrates multi-resolution deep features and achieves higher accuracy and robustness, including sub-pixel localization, outperforming traditional correlation filter methods.
Contribution
It proposes a novel continuous domain formulation for convolution filters, enabling multi-resolution feature integration and sub-pixel localization in visual tracking.
Findings
Improved tracking accuracy on benchmark datasets (+5.1% in mean OP)
Reduced failure rate by 20% on VOT2015
Effective sub-pixel localization demonstrated in feature point tracking
Abstract
Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
