WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management

Bowen Yuan; Selena Song; Javier Fernandez; Yadan Luo; Mahsa Baktashmotlagh; Zijian Wang

arXiv:2506.06084·cs.CV·June 9, 2025

WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management

Bowen Yuan, Selena Song, Javier Fernandez, Yadan Luo, Mahsa Baktashmotlagh, Zijian Wang

PDF

Open Access

TL;DR

WisWheat introduces a specialized three-tiered vision-language dataset designed to improve AI-based wheat management by providing domain-specific data for better reasoning and decision-making.

Contribution

The paper presents a novel wheat-specific dataset with three layers to enhance vision-language models' performance in wheat management tasks.

Findings

01

Fine-tuning VLMs on WisWheat improves accuracy in stress and growth stage tasks.

02

Qwen2.5 VL 7B achieves over 79% accuracy on wheat stress diagnosis.

03

Our dataset surpasses general models like GPT-4o in wheat management tasks.

Abstract

Wheat management strategies play a critical role in determining yield. Traditional management decisions often rely on labour-intensive expert inspections, which are expensive, subjective and difficult to scale. Recently, Vision-Language Models (VLMs) have emerged as a promising solution to enable scalable, data-driven management support. However, due to a lack of domain-specific knowledge, directly applying VLMs to wheat management tasks results in poor quantification and reasoning capabilities, ultimately producing vague or even misleading management recommendations. In response, we propose WisWheat, a wheat-specific dataset with a three-layered design to enhance VLM performance on wheat management tasks: (1) a foundational pretraining dataset of 47,871 image-caption pairs for coarsely adapting VLMs to wheat morphology; (2) a quantitative dataset comprising 7,263 VQA-style…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning