TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios
Xingkui Zhu, Shuchang Lyu, Xu Wang, Qi Zhao

TL;DR
This paper introduces TPH-YOLOv5, an enhanced object detection model for drone scenarios that incorporates transformer prediction heads, attention mechanisms, and various strategies to improve detection accuracy on challenging drone-captured images.
Contribution
The paper proposes a novel YOLOv5-based model with transformer prediction heads and attention modules, achieving state-of-the-art performance on drone datasets.
Findings
TPH-YOLOv5 outperforms previous SOTA methods by 1.81% AP.
The model achieves 39.18% AP on DET-test-challenge dataset.
TPH-YOLOv5 improves baseline YOLOv5 by about 7% in AP.
Abstract
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing
