End-to-end deep learning for directly estimating grape yield from ground-based imagery
Alexander G. Olenskyj, Brent S. Sams, Zhenghao Fei, Vishal Singh,, Pranav V. Raja, Gail M. Bornhorst, J. Mason Earles

TL;DR
This study demonstrates the use of deep learning and proximal ground-based imagery to accurately estimate grape yield in vineyards, reducing manual effort and improving scalability in challenging environments.
Contribution
It introduces an end-to-end deep learning approach that eliminates manual labeling, achieving comparable accuracy to object detection models for vineyard yield estimation.
Findings
Transformer and object detection models achieved ~18% error.
End-to-end models performed comparably to object detection.
Saliency maps showed model focus near grape bunches.
Abstract
Yield estimation is a powerful tool in vineyard management, as it allows growers to fine-tune practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Continuous data collection using a vehicle-mounted sensing kit combined with collection of ground truth yield data at harvest using a commercial yield monitor allowed for the generation of a large dataset of 23,581 yield points and 107,933 images. Moreover, this study was conducted in a mechanically managed commercial vineyard, representing a challenging environment for image analysis but a common set of conditions in the California Central Valley. Three model architectures were tested: object detection, CNN regression, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
