# RAM-UNet: an improved U-Net–based semantic segmentation model for the main stem of mature soybean plants

**Authors:** Li Zhu, Wen Li, Haitao Fu, Xiaoyao Li, Yuxuan Feng

PMC · DOI: 10.3389/fpls.2026.1779621 · Frontiers in Plant Science · 2026-03-10

## TL;DR

This paper introduces RAM-UNet, a new U-Net-based model for accurately segmenting the main stems of mature soybean plants, improving on existing methods.

## Contribution

RAM-UNet introduces a modified U-Net with deformable convolutions, a multi-scale attention module, and a composite loss function for better stem segmentation.

## Key findings

- RAM-UNet achieves 90.58% mean IoU with high recall and precision on a soybean dataset.
- The model outperforms U-Net, DeepLabv3+, and others by 6.41% to 22.41% in mIoU.
- Automatic stem length measurements correlate strongly with manual ones (R² = 0.9746).

## Abstract

As the key structure connecting the vegetative and reproductive organs of soybean plants, the main stem plays a crucial role, and its morphological parameters serve as core phenotypic indicators for evaluating plant growth, lodging resistance, and yield potential. At the mature stage, the main stem exhibits high similarity to pods in color and texture, along with complex curvature and severe occlusion by pods and leaves, making accurate and continuous extraction challenging for conventional segmentation methods. To address this, this study proposes RAM-UNet, a high-precision semantic segmentation model based on an improved U-Net architecture. The model adopts ResNet50 as the backbone and replaces standard convolutions with deformable convolutions to capture curved stem morphology and improve feature extraction for low-contrast edges. In the encoder, the Convolutional Block Attention Module (CBAM) is combined with an improved atrous spatial pyramid pooling (ASPP) module (C-ASPP) with four dilation rates, enhancing multi-scale feature representation compared to the original three-rate design. A multi-scale attention aggregation (MSAA) module in the decoder improves continuity and integrity of stem boundaries. During training, a composite loss function combining Dice loss and cross-entropy loss is employed to mitigate foreground pixel sparsity. Experimental results on a self-constructed dataset show that RAM-UNet achieves a mean Intersection over Union (mIoU) of 90.58%, with Recall and Precision reaching 94.99% and 94.58%, respectively. Compared with U-Net, DeepLabv3+, PSPNet, and SegNet, RAM-UNet improves mIoU by 6.41%, 10.51%, 22.41%, and 17.37%, respectively. Automatically measured stem lengths show high agreement with manual measurements (R² = 0.9746), validating practical applicability. RAM-UNet also generalizes well on the public PASCAL VOC 2012 dataset, achieving an mIoU of 73.14%. The results indicate that the proposed model enables high-precision and continuous segmentation of main stems in mature soybean plants, providing an effective technical solution for automated and non-destructive measurement of crop phenotypic parameters.

## Full-text entities

- **Species:** Glycine max (soybean, species) [taxon 3847]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13008844/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13008844/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC13008844/full.md

---
Source: https://tomesphere.com/paper/PMC13008844