TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for   Object Detection on Drone-captured Scenarios

Xingkui Zhu; Shuchang Lyu; Xu Wang; Qi Zhao

arXiv:2108.11539·cs.CV·August 31, 2021·123 cites

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

Xingkui Zhu, Shuchang Lyu, Xu Wang, Qi Zhao

PDF

Open Access 3 Repos

TL;DR

This paper introduces TPH-YOLOv5, an enhanced object detection model for drone scenarios that incorporates transformer prediction heads, attention mechanisms, and various strategies to improve detection accuracy on challenging drone-captured images.

Contribution

The paper proposes a novel YOLOv5-based model with transformer prediction heads and attention modules, achieving state-of-the-art performance on drone datasets.

Findings

01

TPH-YOLOv5 outperforms previous SOTA methods by 1.81% AP.

02

The model achieves 39.18% AP on DET-test-challenge dataset.

03

TPH-YOLOv5 improves baseline YOLOv5 by about 7% in AP.

Abstract

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing