# Real-Time Intelligent Detection Algorithm for Ship Targets in High-Resolution Wide-Swath Sea Surface Images Captured by Airborne Cameras

**Authors:** Haiying Liu, Qiang Fu, Haoyu Wang, Huaide Zhou, Yingchao Li, Huilin Jiang

PMC · DOI: 10.3390/s26061786 · Sensors (Basel, Switzerland) · 2026-03-12

## TL;DR

A new lightweight model for detecting ships in aerial images achieves high accuracy and real-time performance on embedded systems.

## Contribution

The novel MSF and GSF-Neck modules enhance YOLOv8 for efficient multi-scale ship detection in aerial imagery.

## Key findings

- The model achieves a 94.55% mAP@0.5 for ship detection in aerial imagery.
- It processes large-format images (≥300 MB) at ≥2 fps on an RK3588 embedded system.
- The model improves mAP by 1.4% with only a 6.6% decrease in FPS compared to the baseline.

## Abstract

What are the main findings?
The proposed lightweight YOLOv8 model, enhanced with a Multi-Scale Fusion (MSF) module and Group-Wise Scale Fusion Neck (GSF-Neck), achieves a high mAP@0.5 of 94.55% for ship detection in aerial imagery, significantly outperforming baseline and mainstream detectors. Compared with the baseline under identical conditions, the proposed model improves mAP by 1.4% with only a 6.6% decrease in FPS, achieving a balanced trade-off between detection accuracy and computational efficiency.The optimized model effectively processes single large-format aerial images (≥300 MB) in real time at ≥2 fps on an RK3588 embedded system, with a detection rate of ≥89.5% in aerial tests.

The proposed lightweight YOLOv8 model, enhanced with a Multi-Scale Fusion (MSF) module and Group-Wise Scale Fusion Neck (GSF-Neck), achieves a high mAP@0.5 of 94.55% for ship detection in aerial imagery, significantly outperforming baseline and mainstream detectors. Compared with the baseline under identical conditions, the proposed model improves mAP by 1.4% with only a 6.6% decrease in FPS, achieving a balanced trade-off between detection accuracy and computational efficiency.

The optimized model effectively processes single large-format aerial images (≥300 MB) in real time at ≥2 fps on an RK3588 embedded system, with a detection rate of ≥89.5% in aerial tests.

What are the implications of the main findings?
This work proposes a highly efficient and practical embedded vision solution that enables real-time ship detection on aerial platforms for maritime surveillance, thereby eliminating the need for cloud processing.Moreover, the MSF and GSF-Neck modules introduced in this study provide a reproducible design framework to balance multi-level feature extraction and computational efficiency, which can be extended to detect other resource-constrained objects.

This work proposes a highly efficient and practical embedded vision solution that enables real-time ship detection on aerial platforms for maritime surveillance, thereby eliminating the need for cloud processing.

Moreover, the MSF and GSF-Neck modules introduced in this study provide a reproducible design framework to balance multi-level feature extraction and computational efficiency, which can be extended to detect other resource-constrained objects.

The critical task of ship detection in aerial imagery for maritime monitoring faces significant challenges in achieving real-time performance on embedded platforms. These challenges arise from the large data volume inherent in wide-format aerial images and the pronounced scale variations among vessels. To address this issue, an optimized YOLOv8-based model is proposed. Scale adaptability is enhanced by incorporating a Multi-Scale Fusion (MSF) module into the backbone. In addition, a lightweight Group-Wise Scale Fusion Neck (GSF-Neck) with a parallel multi-branch structure is designed to facilitate adaptive multi-scale feature fusion while reducing computational overhead. The proposed model achieves a state-of-the-art mAP@0.5 of up to 94.55% on a dedicated aerial ship dataset, outperforming other major detectors. When deployed on an RK3588 embedded system using a sliding window strategy to process single 300 MB images, it maintains a stable processing speed of ≥2 fps. Compared to the baseline under identical conditions, the model proposed in this study improves mAP by 1.4% with a 6.6% reduction in FPS, effectively balancing detection performance and computational efficiency.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030454/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030454/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030454/full.md

---
Source: https://tomesphere.com/paper/PMC13030454