# MSF-DETR: A small target detection algorithm for sonar images based on spatial-frequency domain collaborative feature fusion

**Authors:** Heng Zhao, Shuping Han, Jiaying Geng, Yubo Han, Shuyang Jia, Ke Li, Xuebo Zhang, Xuebo Zhang, Xuebo Zhang

PMC · DOI: 10.1371/journal.pone.0336468 · 2025-11-14

## TL;DR

This paper introduces MSF-DETR, a new algorithm for detecting small targets in sonar images, improving accuracy and efficiency in underwater detection.

## Contribution

MSF-DETR introduces a novel end-to-end detection algorithm with spatial-frequency domain fusion for small target detection in sonar images.

## Key findings

- MSF-DETR achieves 78.5% mAP50 on the SSST-3K dataset, outperforming baseline RT-DETR by 2.8%.
- The algorithm reduces computational complexity by 12.0% and reaches 71.2 FPS inference speed.
- It demonstrates 38.5% mAP50-95 on the SSST-3K dataset, a 3.3% improvement over RT-DETR.

## Abstract

Side-scan sonar imaging is essential for underwater target detection in marine exploration and engineering applications, yet small target detection faces significant challenges including limited frequency domain feature utilization, insufficient multi-scale feature fusion, and high computational complexity. This study develops Multi-Scale Spatial-Frequency Collaborative Detection Transformer (MSF-DETR), a novel end-to-end automatic detection algorithm specifically designed for small targets in side-scan sonar images. The method integrates three core innovations: a Multi-domain Adaptive Spatial-frequency Network (MASNet) backbone employing Cascaded dual-domain Mamba-enhanced Spatial-frequency Synergistic Convolution that simultaneously captures spatial geometric and frequency domain texture features; a Hierarchical Multi-scale Adaptive Feature Pyramid Network implementing intelligent weight allocation across different scales; and an Efficient Sparse Attention Transformer Encoder utilizing Window-based Adaptive Sparse Self-Attention mechanism that reduces computational complexity from quadratic to linear. Experimental validation was conducted on the self-built SSST-3K(Side-Scan Sonar Target Detection 3K Dataset) dataset containing approximately 3000 high-quality sonar images and the public KLSG dataset. Results demonstrate that MSF-DETR achieves 78.5% mAP50 and 38.5% mAP50-95 on the SSST-3K dataset, representing improvements of 2.8% and 3.3% respectively compared to baseline RT-DETR, while reducing computational complexity by 12.0% and achieving 71.2 FPS inference speed. The proposed MSF-DETR provides an effective solution for small target detection in complex marine environments, significantly advancing underwater sonar image processing technology.

## Full-text entities

- **Diseases:** CMSSC (MESH:D009105), WASSA (MESH:D018489), ORCID iD (MESH:C535742)
- **Chemicals:** MAFM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12617901/full.md

---
Source: https://tomesphere.com/paper/PMC12617901