# GrapeUL-YOLO: bidirectional cross-scale fusion with elliptical anchors for robust grape detection in orchards

**Authors:** Xiuli Zhu, Zhenghong Yu, Chengwei Li

PMC · DOI: 10.3389/fpls.2025.1701817 · Frontiers in Plant Science · 2026-01-02

## TL;DR

This paper introduces GrapeUL-YOLO, a lightweight and accurate grape detection model for orchards that improves performance through novel network designs and elliptical anchors.

## Contribution

The novel Cross-Scale Residual Feature Backbone, Adaptive Bidirectional Fusion Network, and shape-adaptive detection head with elliptical anchors for grape detection.

## Key findings

- GrapeUL-YOLO achieves an mAP@0.5 of 0.912 and mAP@0.5:0.95 of 0.576 on the Embrapa WGISD dataset.
- The model has only 5.11M parameters and an average detection time of 16.9ms per image.
- It outperforms nine mainstream models like CenterNet and YOLOv11 in grape detection accuracy and efficiency.

## Abstract

Accurate grape detection in orchards is a core link in realizing automated harvesting. To address the challenges in orchard environments, such as complex grape backgrounds, variable lighting conditions, and dense occlusion of fruits, this study proposes a highly robust real-time grape detection model for orchard scenarios, namely Grapevine Ultra-Lightweight YOLO (GrapeUL-YOLO). Based on YOLOv11, this model enhances detection performance through three innovative designs: firstly, it adopts a Cross-Scale Residual Feature Backbone (CSRB) as the feature extraction network, combining 
16× downsampling operation with modules such as C3k2_SP and SPPELAN, which reduces computational complexity while retaining multi-scale features of grapes from small clusters to entire clusters; secondly, it constructs an Adaptive Bidirectional Fusion Network (ABFN) in the detection Neck, and through CARAFE content-aware upsampling and a bidirectional cross-scale concatenation mechanism, it strengthens the interaction between spatial details and semantic information, thereby improving the feature fusion capability in scenes with dense occlusion; thirdly, it designs a shape-adaptive detection Head, which uses customized elliptical anchor boxes to match the natural shape of grapes and detects grape targets of different sizes according to scale division. Experimental results show that on the Embrapa WGISD dataset, the mAP@0.5 of GrapeUL-YOLO reaches 0.912, and the mAP@0.5:0.95 is 0.576, both outperforming 9 mainstream models including CenterNet and YOLOv11; meanwhile, the model has only 5.11M parameters and an average detection time of 16.9ms per image, achieving a balance between high precision and lightweight, and providing an efficient solution for automated grape detection and harvesting in orchards.

## Full-text entities

- **Diseases:** Leaf Occlusion (MESH:D001157), fatigue (MESH:D005221)
- **Chemicals:** ABFN (-)
- **Species:** Oryza sativa (Asian cultivated rice, species) [taxon 4530], Glycine max (soybean, species) [taxon 3847]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12808474/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12808474/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12808474/full.md

---
Source: https://tomesphere.com/paper/PMC12808474