TL;DR
AgroVG is a comprehensive benchmark dataset designed to evaluate agricultural visual grounding models across multiple sources, target types, and grounding tasks, highlighting current performance gaps.
Contribution
This paper introduces AgroVG, a large-scale, multi-source benchmark for agricultural visual grounding, supporting diverse tasks and providing a standardized evaluation protocol.
Findings
Zero-shot models perform poorly on multi-target set prediction.
Best models achieve only 0.35 Set-F1 score for multi-target localization.
Mask success rate at [email protected] remains below 0.17.
Abstract
Visual grounding, the task of localizing objects described by natural-language expressions, is a foundational capability for agricultural AI systems, enabling applications such as selective weeding, disease monitoring, and targeted harvesting. Reliable evaluation of agricultural visual grounding remains challenging because agricultural targets are often small, repetitive, occluded, or irregularly shaped, and instructions may refer to one, many, or no objects in an image. Evaluating this capability therefore requires jointly testing localization accuracy, target-set completeness, and existence-aware abstention. To address these challenges, we introduce \textbf{AgroVG}, a multi-source benchmark that formulates agricultural grounding as generalized set prediction: given an image and a referring expression, a model must return all matching target instances or abstain when no target is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
