Multi-label Instance-level Generalised Visual Grounding in Agriculture
Mohammadreza Haghighat, Alzayat Saleh, Mostafa Rahimi Azghadi

TL;DR
This paper introduces a new dataset and framework for visual grounding in agriculture, addressing the challenge of localising crop and weed instances in field images, which is crucial for precision farming.
Contribution
The paper presents gRef-CW, the first benchmark dataset for agricultural visual grounding, and Weed-VG, a novel modular framework that improves instance-level grounding in complex field conditions.
Findings
State-of-the-art models perform poorly on gRef-CW, indicating a domain gap.
Weed-VG achieves better grounding accuracy in agricultural images.
The dataset and framework set a new baseline for future research in agricultural visual grounding.
Abstract
Understanding field imagery such as detecting plants and distinguishing individual crop and weed instances is a central challenge in precision agriculture. Despite progress in vision-language tasks like captioning and visual question answering, Visual Grounding (VG), localising language-referred objects, remains unexplored in agriculture. A key reason is the lack of suitable benchmark datasets for evaluating grounding models in field conditions, where many plants look highly similar, appear at multiple scales, and the referred target may be absent from the image. To address these limitations, we introduce gRef-CW, the first dataset designed for generalised visual grounding in agriculture, including negative expressions. Benchmarking current state-of-the-art grounding models on gRef-CW reveals a substantial domain gap, highlighting their inability to ground instances of crops and weeds.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Smart Agriculture and AI
