# Hierarchical Local-Global Feature Fusion Network for Robust Ship Target Recognition in Complex Maritime Environment

**Authors:** Xuanhe Liu, Shuning Zhang, Si Chen, Jianchao Li, Yingying Luo

PMC · DOI: 10.3390/s26010029 · Sensors (Basel, Switzerland) · 2025-12-19

## TL;DR

A new model combines CNN and Transformer features to improve ship recognition in challenging maritime environments.

## Contribution

A hierarchical local-global feature fusion network is introduced for robust ship target recognition.

## Key findings

- HLGF-Net achieves higher classification accuracy and F1 scores than traditional models.
- The model is robust under low signal-to-noise ratios and limited sample conditions.
- It effectively integrates local and global features for maritime target recognition.

## Abstract

What are the main findings?
This paper proposes a hierarchical local-global feature fusion model that integrates local structural features extracted by convolutional neural networks with global semantic dependencies modeled by Transformer architectures through a progressive multilayer self-attention mechanism.Extensive experiments on both the FUSAR dataset and a measured dataset demonstrate that the proposed model achieves superior classification accuracy and F1 scores compared with traditional CNNs, pure Transformer models, and representative recent vision architectures, while maintaining competitive inference efficiency. The model also exhibits strong robustness under low signal-to-noise ratios and limited sample conditions.

This paper proposes a hierarchical local-global feature fusion model that integrates local structural features extracted by convolutional neural networks with global semantic dependencies modeled by Transformer architectures through a progressive multilayer self-attention mechanism.

Extensive experiments on both the FUSAR dataset and a measured dataset demonstrate that the proposed model achieves superior classification accuracy and F1 scores compared with traditional CNNs, pure Transformer models, and representative recent vision architectures, while maintaining competitive inference efficiency. The model also exhibits strong robustness under low signal-to-noise ratios and limited sample conditions.

What are the implications of the main findings?
Hierarchical encoding of local structural features and global contextual dependencies provides a novel approach for extracting vessel target features under complex sea conditions, enhancing the reliability of maritime target recognition.Transfer learning methods based on partial fine-tuning can efficiently adapt to limited labeled data, enabling rapid deployment of high-precision recognition systems in resource-constrained environments.

Hierarchical encoding of local structural features and global contextual dependencies provides a novel approach for extracting vessel target features under complex sea conditions, enhancing the reliability of maritime target recognition.

Transfer learning methods based on partial fine-tuning can efficiently adapt to limited labeled data, enabling rapid deployment of high-precision recognition systems in resource-constrained environments.

Accurate ship target recognition remains challenging in complex maritime environments due to background clutter, multiscale target appearance, and limited discriminative features extracted by single-type networks. To address these issues, this paper proposes a hierarchical local-global feature fusion network (HLGF-Net) that integrates local structural cues from a CNN encoder with global semantic dependencies modeled by a Transformer. The proposed model progressively constructs hierarchical dependencies through stacked Transformer blocks, enabling comprehensive integration of local structural details and global semantic context. This design enhances the capability to capture fine-grained local contours and long-range global contextual relationships simultaneously. Extensive experiments on ship recognition datasets demonstrate that HLGF-Net achieves superior performance compared with traditional CNNs, pure Transformers, and representative recent vision architectures, particularly under conditions of cluttered backgrounds, partial occlusion, and limited target samples. The proposed framework provides an effective solution for robust maritime target recognition and offers a general strategy for hierarchical local-global feature integration.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** ResNet (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12787353/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12787353/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12787353/full.md

---
Source: https://tomesphere.com/paper/PMC12787353