WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management
Bowen Yuan, Selena Song, Javier Fernandez, Yadan Luo, Mahsa Baktashmotlagh, Zijian Wang

TL;DR
WisWheat introduces a specialized three-tiered vision-language dataset designed to improve AI-based wheat management by providing domain-specific data for better reasoning and decision-making.
Contribution
The paper presents a novel wheat-specific dataset with three layers to enhance vision-language models' performance in wheat management tasks.
Findings
Fine-tuning VLMs on WisWheat improves accuracy in stress and growth stage tasks.
Qwen2.5 VL 7B achieves over 79% accuracy on wheat stress diagnosis.
Our dataset surpasses general models like GPT-4o in wheat management tasks.
Abstract
Wheat management strategies play a critical role in determining yield. Traditional management decisions often rely on labour-intensive expert inspections, which are expensive, subjective and difficult to scale. Recently, Vision-Language Models (VLMs) have emerged as a promising solution to enable scalable, data-driven management support. However, due to a lack of domain-specific knowledge, directly applying VLMs to wheat management tasks results in poor quantification and reasoning capabilities, ultimately producing vague or even misleading management recommendations. In response, we propose WisWheat, a wheat-specific dataset with a three-layered design to enhance VLM performance on wheat management tasks: (1) a foundational pretraining dataset of 47,871 image-caption pairs for coarsely adapting VLMs to wheat morphology; (2) a quantitative dataset comprising 7,263 VQA-style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
