# Paddy pest image segmentation based on multiscale attention fusion VM-UNet

**Authors:** Yunlong Zhang, Yu Shao, Ting Zhang

PMC · DOI: 10.3389/fpls.2025.1700556 · 2026-01-15

## TL;DR

This paper introduces a new neural network model for accurately segmenting paddy pests in natural environments, improving detection performance over existing methods.

## Contribution

The novel MSAF-VMUNet integrates VSS and U-Net with multiscale attention fusion for efficient and accurate paddy pest segmentation.

## Key findings

- MSAF-VMUNet achieves 79.17% precision in paddy pest segmentation, outperforming U-Net and VM-UNet.
- The model effectively handles small pest detection, occlusion, and noise without increasing computational complexity.
- MSAF-VMUNet is validated on the IP102 dataset's paddy pest subset for real-world agricultural applications.

## Abstract

Precise paddy pest image segmentation (PPIS) in the real-time natural environments is an important and challenging research. Convolutional Neural Networks (CNNs) and Transformers are the most popular architectures for image segmentation, but they usually have limitations in modeling global dependencies and quadratic computational complexity, respectively. A multiscale attention fusion VM-UNet (MSAF-VMUNet) for PPIS is constructed. It integrates the long-range dependencies modeling ability of Visual State Space Model (VSS) and the precise positioning capability of U-Net with low computational complexity. In the model, multiscale VSS (MSVSS) block is used to capture the long-range contextual information, and improved attention fusion (IAF) module is designed for multi-level feature learning between Encoder and Decoder. Attention VSS module is introduced in the bottleneck layer to enable the model to adaptively emphasize key features and suppress redundant information. Compared with VM-UNet, MSAF-VMUNet can effectively model global-local and context relationships at the scale layer, and improve the detection performance of various pests in size and shape without increasing computational complexity. The experimental results on the paddy pest subset of the public IP102 dataset validate that MSAF-VMUNet can effectively address the key challenges in field PPIS, including small pest detection, occlusion and noise handling, and preprocessing requirements, and the PPIS presion is 79.17%, which are 15.51% and 3.39% higher than those of the traditional U-Net and the recent VM-UNet, respectively. It provides an effective and reliable solution for pest control detection system in smart agriculture.

## Full-text entities

- **Diseases:** Paddy pest (MESH:D029021)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12854137/full.md

---
Source: https://tomesphere.com/paper/PMC12854137