Local Perception-Aware Transformer for Aerial Tracking

Changhong Fu; Weiyu Peng; Sihang Li; Junjie Ye; Ziang Cao

arXiv:2208.00662·cs.CV·August 9, 2022

Local Perception-Aware Transformer for Aerial Tracking

Changhong Fu, Weiyu Peng, Sihang Li, Junjie Ye, Ziang Cao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a local perception-aware transformer for aerial tracking that enhances local detail modeling and reduces global redundancy interference, leading to improved accuracy and robustness in aerial benchmarks.

Contribution

It proposes a novel local-recognition encoder with local attention and correction networks, improving local detail modeling in aerial object tracking.

Findings

01

Achieves competitive accuracy on aerial benchmarks

02

Demonstrates robustness in real-world tests

03

Enhances local feature modeling in transformer-based tracking

Abstract

Transformer-based visual object tracking has been utilized extensively. However, the Transformer structure is lack of enough inductive bias. In addition, only focusing on encoding the global feature does harm to modeling local details, which restricts the capability of tracking in aerial robots. Specifically, with local-modeling to global-search mechanism, the proposed tracker replaces the global encoder by a novel local-recognition encoder. In the employed encoder, a local-recognition attention and a local element correction network are carefully designed for reducing the global redundant information interference and increasing local inductive bias. Meanwhile, the latter can model local object details precisely under aerial view through detail-inquiry net. The proposed method achieves competitive accuracy and robustness in several authoritative aerial benchmarks with 316 sequences in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vision4robotics/lpat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Robotics and Sensor-Based Localization

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding