# Distilled Siamese Networks for Visual Tracking

**Authors:** Jianbing Shen, Yuanpei Liu, Xingping Dong, Xiankai Lu and, Fahad Shahbaz Khan, Steven Hoi

arXiv: 1907.10586 · 2022-11-30

## TL;DR

This paper introduces a knowledge distillation framework for Siamese visual trackers, enabling the creation of smaller, faster, and accurate models suitable for mobile devices without sacrificing tracking performance.

## Contribution

The paper proposes a novel teacher-student distillation approach with mutual learning for Siamese trackers, reducing memory costs and increasing speed while maintaining accuracy.

## Key findings

- Achieves up to 18× model compression
- Runs at 265 FPS on benchmarks
- Maintains comparable accuracy to larger models

## Abstract

In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, Siamese trackers tend to suffer from high memory costs, which restrict their applicability to mobile devices with tight memory budgets. To address this issue, we propose a distilled Siamese tracking framework to learn small, fast and accurate trackers (students), which capture critical knowledge from large Siamese trackers (teachers) by a teacher-students knowledge distillation model. This model is intuitively inspired by the one teacher vs. multiple students learning method typically employed in schools. In particular, our model contains a single teacher-student distillation module and a student-student knowledge sharing mechanism. The former is designed using a tracking-specific distillation strategy to transfer knowledge from a teacher to students. The latter is utilized for mutual learning between students to enable in-depth knowledge understanding. Extensive empirical evaluations on several popular Siamese trackers demonstrate the generality and effectiveness of our framework. Moreover, the results on five tracking benchmarks show that the proposed distilled trackers achieve compression rates of up to 18$\times$ and frame-rates of $265$ FPS, while obtaining comparable tracking accuracy compared to base models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10586/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10586/full.md

## References

86 references — full list in the complete paper: https://tomesphere.com/paper/1907.10586/full.md

---
Source: https://tomesphere.com/paper/1907.10586