Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection
Guiping Cao, Wenjian Huang, Xiangyuan Lan, Jianguo Zhang, Dongmei Jiang, and Yaowei Wang

TL;DR
Cross-DINO introduces a novel method combining deep MLP and Transformer encoders with a new module and loss function to significantly improve small object detection performance in various datasets.
Contribution
The paper proposes Cross-DINO, a new approach that integrates deep MLP, a Cross Coding Twice Module, and a Boost Loss with a new soft label for enhanced small object detection.
Findings
Achieves 36.4% APs on COCO for SOD, outperforming DINO by 4.4%.
Requires only 45M parameters, fewer FLOPs, and under 12 epochs training.
Demonstrates effectiveness across multiple datasets like COCO, WiderPerson, VisDrone, AI-TOD, and SODA-D.
Abstract
Small Object Detection (SOD) poses significant challenges due to limited information and the model's low class prediction score. While Transformer-based detectors have shown promising performance, their potential for SOD remains largely unexplored. In typical DETR-like frameworks, the CNN backbone network, specialized in aggregating local information, struggles to capture the necessary contextual information for SOD. The multiple attention layers in the Transformer Encoder face difficulties in effectively attending to small objects and can also lead to blurring of features. Furthermore, the model's lower class prediction score of small objects compared to large objects further increases the difficulty of SOD. To address these challenges, we introduce a novel approach called Cross-DINO. This approach incorporates the deep MLP network to aggregate initial feature representations with both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
