# A collaborative multi-attention network for real-time small object detection in UAV imagery

**Authors:** Jianxiu Yang, Xiangmei Yue, Liang Wu

PMC · DOI: 10.1038/s41598-026-36440-2 · Scientific Reports · 2026-01-20

## TL;DR

This paper introduces a new network for detecting small objects in drone images, improving accuracy and real-time performance.

## Contribution

The novel CMA-Net combines E-BiFPN, DDCA, and MSFA modules for enhanced small object detection in UAV imagery.

## Key findings

- CMA-Net achieves 67.2% accuracy on the UAVDT dataset.
- The method operates at 64 frames per second, meeting real-time requirements.
- It outperforms existing methods in small object detection and background suppression.

## Abstract

To address the challenges of detecting small objects in unmanned aerial vehicle (UAV) imagery, such as weak feature representation, complex background interference, and high real-time requirements, this paper proposes a Collaborative Multi-Attention Network (CMA-Net) for real-time small object detection. The network incorporates an efficient bi-directional feature pyramid structure (E-BiFPN) to achieve multi-scale weighted feature fusion while minimizing parameter count and computational cost. A Dual-Dimensional Channel Attention (DDCA) module is further introduced, which adaptively recalibrates channel significance along the width and height dimensions, capturing long-range dependencies and improving spatial sensitivity. Additionally, a Multi-Scale Foreground Attention (MSFA) module is designed to explore inter-object correlations across different feature layers, enhancing foreground representation, suppressing background interference, and improving feature discriminability for small objects. By integrating E-BiFPN, DDCA, and MSFA, CMA-Net achieves collaborative feature enhancement and significantly boosts overall discriminative power. Experimental results demonstrate that the proposed method achieves accuracies of 67.2% and 62.0% on the public UAVDT and Stanford Drone datasets, respectively, while operating at 64 frames per second, meeting real-time inference requirements.

## Full-text entities

- **Diseases:** DDCA (MESH:D009105)
- **Chemicals:** FBlock (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12894738/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12894738/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12894738/full.md

---
Source: https://tomesphere.com/paper/PMC12894738