Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion
Ji Huang, Hui Wang

TL;DR
This paper enhances small object detection in real-time models by introducing detailed feature augmentation and adaptive multi-scale feature fusion, significantly improving detection accuracy without sacrificing speed.
Contribution
It proposes a fine-grained path augmentation and an adaptive feature fusion method to improve small object detection in RT-DETR models.
Findings
Improved small object detection accuracy in RT-DETR.
Effective integration of detailed and semantic features.
Enhanced multi-scale feature fusion performance.
Abstract
The main challenge for small object detection algorithms is to ensure accuracy while pursuing real-time performance. The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. In order to compensate for the shortcomings of the RT-DETR model in small object detection, two key improvements are proposed in this study. Firstly, The RT-DETR utilises a Transformer that receives input solely from the final layer of Backbone features. This means that the Transformer's input only receives semantic information from the highest level of abstraction in the Deep Network, and ignores detailed information such as edges, texture or color gradients that are critical to the location of small objects at lower levels of abstraction. Including only deep features can introduce additional background noise. This can have a negative impact on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Layer Normalization · Softmax · Residual Connection · Linear Layer · Byte Pair Encoding · Dropout
