CREST: Convolutional Residual Learning for Visual Tracking
Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan, Yang

TL;DR
CREST introduces an end-to-end trainable convolutional residual learning framework for visual tracking, integrating feature extraction, response generation, and model updating to improve robustness and performance.
Contribution
It reformulates discriminative correlation filters as a neural network, enabling end-to-end training and residual learning for better online adaptation.
Findings
CREST outperforms state-of-the-art trackers on benchmark datasets.
End-to-end training improves tracking robustness.
Residual learning reduces model degradation during updates.
Abstract
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual tracking. They only need a small set of training samples from the initial frame to generate an appearance model. However, existing DCFs learn the filters separately from feature extraction, and update these filters using a moving average operation with an empirical weight. These DCF trackers hardly benefit from the end-to-end training. In this paper, we propose the CREST algorithm to reformulate DCFs as a one-layer convolutional neural network. Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training. To reduce model degradation during online update, we apply residual learning to take appearance changes into account. Extensive experiments on the benchmark datasets demonstrate that our CREST tracker performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Fire Detection and Safety Systems
