# Security in Transformer Visual Trackers: A Case Study on the Adversarial Robustness of Two Models

**Authors:** Peng Ye, Yuanfang Chen, Sihang Ma, Feng Xue, Noel Crespi, Xiaohan Chen, Xing Fang

PMC · DOI: 10.3390/s24144761 · Sensors (Basel, Switzerland) · 2024-07-22

## TL;DR

This paper studies the security of transformer-based visual trackers used in autonomous driving and shows they are vulnerable to adversarial attacks that significantly reduce tracking performance.

## Contribution

The paper introduces a method to generate adversarial examples for transformer-based visual trackers, demonstrating high attack success rates in real-world scenarios.

## Key findings

- Adversarial attacks on transformer-based visual trackers caused significant performance degradation.
- White-box attacks achieved over 90% success rates in disrupting tracking performance.
- Temporal motion was considered when generating adversarial perturbations, enhancing attack effectiveness.

## Abstract

Visual object tracking is an important technology in camera-based sensor networks, which has a wide range of practicability in auto-drive systems. A transformer is a deep learning model that adopts the mechanism of self-attention, and it differentially weights the significance of each part of the input data. It has been widely applied in the field of visual tracking. Unfortunately, the security of the transformer model is unclear. It causes such transformer-based applications to be exposed to security threats. In this work, the security of the transformer model was investigated with an important component of autonomous driving, i.e., visual tracking. Such deep-learning-based visual tracking is vulnerable to adversarial attacks, and thus, adversarial attacks were implemented as the security threats to conduct the investigation. First, adversarial examples were generated on top of video sequences to degrade the tracking performance, and the frame-by-frame temporal motion was taken into consideration when generating perturbations over the depicted tracking results. Then, the influence of perturbations on performance was sequentially investigated and analyzed. Finally, numerous experiments on OTB100, VOT2018, and GOT-10k data sets demonstrated that the executed adversarial examples were effective on the performance drops of the transformer-based visual tracking. White-box attacks showed the highest effectiveness, where the attack success rates exceeded 90% against transformer-based trackers.

## Full-text entities

- **Diseases:** IoU (MESH:D006963), injury to people or property (MESH:C000719191), CSA (MESH:D003057)
- **Chemicals:** CSA (MESH:D016572)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11281126/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11281126/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC11281126/full.md

---
Source: https://tomesphere.com/paper/PMC11281126