Learning Motion Blur Robust Vision Transformers for Real-Time UAV Tracking

You Wu; Xucheng Wang; Dan Zeng; Hengzhou Ye; Xiaolan Xie; Qijun Zhao; and Shuiwang Li

arXiv:2407.05383·cs.CV·August 22, 2025

Learning Motion Blur Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, and Shuiwang Li

PDF

Open Access

TL;DR

This paper introduces BDTrack, a real-time UAV tracking method that enhances vision transformers with adaptive computation and motion blur robustness, improving efficiency and accuracy in challenging UAV scenarios.

Contribution

The paper proposes an adaptive computation framework for vision transformers and a motion blur invariant feature learning method, specifically tailored for real-time UAV tracking.

Findings

01

Achieves real-time performance with efficient computation.

02

Improves robustness to motion blur in UAV tracking.

03

Validated on four benchmark datasets with superior results.

Abstract

Unmanned aerial vehicle (UAV) tracking is critical for applications like surveillance, search-and-rescue, and autonomous navigation. However, the high-speed movement of UAVs and targets introduces unique challenges, including real-time processing demands and severe motion blur, which degrade the performance of existing generic trackers. While single-stream vision transformer (ViT) architectures have shown promise in visual tracking, their computational inefficiency and lack of UAV-specific optimizations limit their practicality in this domain. In this paper, we boost the efficiency of this framework by tailoring it into an adaptive computation framework that dynamically exits Transformer blocks for real-time UAV tracking. The motivation behind this is that tracking tasks with fewer challenges can be adequately addressed using low-level feature representations. Simpler tasks can often be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Infrared Target Detection Methodologies · Robotics and Sensor-Based Localization

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam