# A Scale-Adaptive Aggregation and Multi-Domain Feature Fusion Architecture for Small-Target Detection in UAV Aerial Imagery

**Authors:** Zhiwei Sun, Guanglei Zhang, Yuxin Xing, Yuliang Liu

PMC · DOI: 10.3390/s26051610 · Sensors (Basel, Switzerland) · 2026-03-04

## TL;DR

This paper introduces MSCM-YOLO, a new framework for detecting small objects in UAV imagery, improving accuracy while keeping computation efficient.

## Contribution

The novel MSCM-YOLO framework introduces a lightweight backbone and fusion mechanisms for better small-target detection in UAV aerial imagery.

## Key findings

- MSCM-YOLO outperforms YOLOv11 with mAP50 and mAP50:95 scores of 44.41% and 27.13% on the VisDrone2019 dataset.
- The framework maintains computational efficiency suitable for UAV deployment while achieving significant performance gains.
- Validation on UAVDT, DIOR, and AI-TOD datasets confirms robustness and generalization of the proposed method.

## Abstract

Vision-based unmanned aerial vehicles (UAVs) have been widely studied and applied in aerial monitoring tasks; however, detecting small objects in UAV imagery remains challenging due to limited visual features, significant scale variations, dense distributions, and complex background interference. In real-world UAV scenarios, small objects often occupy only a few pixels and are easily obscured by cluttered backgrounds, which complicates stable and accurate detection. To address these issues, this study proposes MSCM-YOLO, a UAV-oriented lightweight detection framework based on YOLOv11. The framework integrates four key innovations: (1) a dedicated P2 detection head to preserve high-resolution features for extremely small and dense targets; (2) a lightweight backbone enhanced with Mobile Bottleneck Convolution (MBConv) to improve feature extraction for visually weak objects; (3) a Scale-Adaptive Attention Fusion (SAF) mechanism with a Channel-Adaptive Projection (CAP) module to effectively integrate multi-scale spatial and semantic features under large object-size variations; and (4) a Multi-Domain Feature Attention Fusion (MDFAF) module to enhance target–background discrimination in complex UAV scenes. Experiments on the VisDrone2019 dataset show that MSCM-YOLO achieves mAP50 and mAP50:95 scores of 44.41% and 27.13%, respectively, outperforming the YOLOv11 baseline by 10.77 and 7.22 percentage points. Notably, the proposed framework achieves this significant performance improvement while maintaining a balanced computational profile suitable for UAV deployment. Additional validation on the UAVDT, DIOR, and AI-TOD datasets confirms consistent improvements in mAP50, demonstrating the robustness and generalization ability of the proposed method. Overall, MSCM-YOLO provides an effective and practical solution for accurate small object detection in aerial monitoring applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12986596/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12986596/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC12986596/full.md

---
Source: https://tomesphere.com/paper/PMC12986596