Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management

Mihir Gupta; Abhay Mangla; Ross Greer; Pratik Desai

arXiv:2507.08024·cs.CV·July 14, 2025

Self-Consistency in Vision-Language Models for Precision Agriculture: Multi-Response Consensus for Crop Disease Management

Mihir Gupta, Abhay Mangla, Ross Greer, Pratik Desai

PDF

TL;DR

This paper introduces a domain-aware framework combining prompt-based expert evaluation and self-consistency mechanisms to improve vision-language model accuracy in crop disease diagnosis, enabling reliable real-time agricultural decisions.

Contribution

It presents a novel self-consistency approach with prompt-based expert evaluation tailored for agricultural image analysis, significantly enhancing diagnostic performance in precision agriculture.

Findings

01

Diagnostic accuracy improved from 82.2% to 87.8%.

02

Symptom analysis accuracy increased from 38.9% to 52.2%.

03

Treatment recommendation accuracy rose from 27.8% to 43.3%.

Abstract

Precision agriculture relies heavily on accurate image analysis for crop disease identification and treatment recommendation, yet existing vision-language models (VLMs) often underperform in specialized agricultural domains. This work presents a domain-aware framework for agricultural image processing that combines prompt-based expert evaluation with self-consistency mechanisms to enhance VLM reliability in precision agriculture applications. We introduce two key innovations: (1) a prompt-based evaluation protocol that configures a language model as an expert plant pathologist for scalable assessment of image analysis outputs, and (2) a cosine-consistency self-voting mechanism that generates multiple candidate responses from agricultural images and selects the most semantically coherent diagnosis using domain-adapted embeddings. Applied to maize leaf disease identification from field…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.