DINOv3 Meets YOLO26 for Weed Detection in Vegetable Crops
Boyang Deng, Yuzhen Lu

TL;DR
This study introduces a novel weed detection model combining DINOv3 and YOLO26, leveraging self-supervised learning and heterogeneous datasets to improve accuracy and generalization in vegetable crop weeding.
Contribution
It presents a new integrated model using a fine-tuned DINOv3 backbone within YOLO26, enhancing detection performance and cross-domain robustness for precision agriculture.
Findings
Achieved up to +5.4% mAP50 on in-domain data
Improved cross-domain mAP50 by +14.0% and +11.9%
Maintains real-time detection at ~28.5 fps
Abstract
Developing robust models for precision vegetable weeding is currently constrained by the scarcity of large-scale, annotated weed-crop datasets. To address this limitation, this study proposes a foundational crop-weed detection model by integrating heterogeneous datasets and leveraging self-supervised learning. A total of 618,642 crop-weed images were initially collected and subsequently refined to 199,388 filtered images for fine-tuning a DINOv3 vision transformer (ViT-small) through a sequential curation strategy. The fine-tuned DINOv3 backbone was then integrated into YOLO26, serving either as a primary backbone or part of a dual-backbone architecture. A feature alignment loss was introduced in the dual backbone framework to enhance feature fusion with minimal computational overhead. Experimental results show that the proposed DINOv3-finetuned ViT-small-based YOLO26-large achieved up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Remote Sensing in Agriculture · Plant Disease Management Techniques
