# MSRRT-DETR: A high-precision apple detection method with strong cross-domain generalization capability in complex orchard scenes

**Authors:** Xinyu Zhang, Sawut Mamat, Xiaohuang Liu, Jiufen Liu, Run Liu, Guangjie Wu, Ping Zhu, Hongyu Li, Min Ma, Xiaotong Liu

PMC · DOI: 10.1371/journal.pone.0342854 · PLOS One · 2026-03-13

## TL;DR

This paper introduces MSRRT-DETR, a new method for detecting apples in orchards that works well even in complex and varied environments.

## Contribution

The novel contribution is the development of MSRRT-DETR with three key enhancements for improved accuracy and generalization in fruit detection.

## Key findings

- MSRRT-DETR achieves 87.3% mAP50 on the TSApple dataset, outperforming other models.
- The model demonstrates strong cross-domain generalization on four public datasets.
- MSRRT-DETR maintains real-time performance with an inference speed of 30.2 FPS.

## Abstract

Accurate fruit detection is a key component of precision agriculture applications such as crop yield estimation, orchard management, and intelligent harvesting. In scenarios where immature fruits exhibit visual similarity to the background or where significant varietal differences exist, traditional models often lack sufficient generalization ability, resulting in reduced detection accuracy and unstable predictions. To address this problem, this paper proposes a fruit detection model, MSRRT-DETR, which achieves a balance of high accuracy, real-time performance, and strong generalization capability. To improve detection accuracy and robustness in complex orchard environments, MSRRT-DETR introduces three major enhancements to the RT-DETR framework: a Multi-Scale Convolutional Attention Module (MSBlock) to enhance feature representation at different scales; a Spatial and Channel Synergistic Attention Module (SCSA) to improve object focus and discriminative capability; and a Re-parameterized Feature Pyramid Network (RepGFPN) to achieve efficient multi-scale feature fusion. Experimental results show that MSRRT-DETR achieves a mAP50 of 87.3% on the self-constructed TSApple dataset, outperforming mainstream lightweight models YOLOv8, YOLO11, and YOLO12 by 2.0–7.9 percentage points, exceeding two-stage detectors including Faster R-CNN, Mask R-CNN, and Cascade R-CNN by 5.1–8.6 percentage points, and surpassing the RT-DETR series by 1.1–2.6 percentage points. With an inference speed of 30.2 FPS, comparable to the YOLO series, MSRRT-DETR achieves an excellent balance between accuracy and real-time performance. In addition, MSRRT-DETR demonstrates outstanding cross-domain generalization capability on four public datasets including MinneApple, validating its stable applicability across diverse scenarios and fruit varieties. MSRRT-DETR combines high recognition accuracy, fast inference, and strong cross-domain generalization, fully meeting the requirements of fruit detection in complex agricultural scenarios. The model provides robust technical support for intelligent monitoring and automated orchard management in precision agriculture, and holds significant practical value and broad potential for application.

## Full-text entities

- **Diseases:** Neck (MESH:D006258)
- **Chemicals:** DETR-18 (-)
- **Species:** Malus domestica (apple, species) [taxon 3750]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987427/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987427/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987427/full.md

---
Source: https://tomesphere.com/paper/PMC12987427