ECO: Efficient Convolution Operators for Tracking
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

TL;DR
This paper introduces an efficient convolution operator for tracking that reduces computational complexity and over-fitting, achieving faster speeds and improved accuracy on multiple benchmarks.
Contribution
It proposes a factorized convolution operator, a compact generative model, and a conservative update strategy to enhance tracking speed and robustness.
Findings
20-fold speedup with deep features on VOT2016
65.0% AUC on OTB-2015 with hand-crafted features
Robust performance across four benchmark datasets
Abstract
In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Image Enhancement Techniques · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
