Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management
Mihir Gupta, Abhay Mangla, Ross Greer, Pratik Desai

TL;DR
This paper introduces a domain-aware framework combining prompt-based expert evaluation and self-consistency mechanisms to improve vision-language model accuracy in crop disease diagnosis, enabling reliable real-time agricultural decisions.
Contribution
It presents a novel self-consistency approach with prompt-based expert evaluation tailored for agricultural image analysis, significantly enhancing diagnostic performance in precision agriculture.
Findings
Diagnostic accuracy improved from 82.2% to 87.8%.
Symptom analysis accuracy increased from 38.9% to 52.2%.
Treatment recommendation accuracy rose from 27.8% to 43.3%.
Abstract
Precision agriculture relies heavily on accurate image analysis for crop disease identification and treatment recommendation, yet existing vision-language models (VLMs) often underperform in specialized agricultural domains. This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms to enhance VLM reliability in precision agriculture applications. We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images and selects the most semantically coherent diagnosis using domain-adapted embeddings. Applied to maize leaf disease identification from field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
