Local Perception-Aware Transformer for Aerial Tracking
Changhong Fu, Weiyu Peng, Sihang Li, Junjie Ye, Ziang Cao

TL;DR
This paper introduces a local perception-aware transformer for aerial tracking that enhances local detail modeling and reduces global redundancy interference, leading to improved accuracy and robustness in aerial benchmarks.
Contribution
It proposes a novel local-recognition encoder with local attention and correction networks, improving local detail modeling in aerial object tracking.
Findings
Achieves competitive accuracy on aerial benchmarks
Demonstrates robustness in real-world tests
Enhances local feature modeling in transformer-based tracking
Abstract
Transformer-based visual object tracking has been utilized extensively. However, the Transformer structure is lack of enough inductive bias. In addition, only focusing on encoding the global feature does harm to modeling local details, which restricts the capability of tracking in aerial robots. Specifically, with local-modeling to global-search mechanism, the proposed tracker replaces the global encoder by a novel local-recognition encoder. In the employed encoder, a local-recognition attention and a local element correction network are carefully designed for reducing the global redundant information interference and increasing local inductive bias. Meanwhile, the latter can model local object details precisely under aerial view through detail-inquiry net. The proposed method achieves competitive accuracy and robustness in several authoritative aerial benchmarks with 316 sequences in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding
