Agro-Consensus: Semantic Self-Consistency in Vision-Language Models for Crop Disease Management in Developing Countries
Mihir Gupta, Pratik Desai, Ross Greer

TL;DR
This paper presents a cost-effective semantic self-consistency framework for vision-language models to improve crop disease diagnosis and management in developing countries, addressing resource constraints and unreliable connectivity.
Contribution
It introduces a novel semantic clustering and consensus mechanism combined with human-in-the-loop validation to enhance model reliability for agricultural image captioning.
Findings
Achieves 83.1% accuracy with 10 candidate generations, outperforming baseline.
Improves top-4 accuracy to 94.0%, surpassing baseline performance.
Demonstrates effectiveness on PlantVillage dataset with a lightweight model.
Abstract
Agricultural disease management in developing countries such as India, Kenya, and Nigeria faces significant challenges due to limited access to expert plant pathologists, unreliable internet connectivity, and cost constraints that hinder the deployment of large-scale AI systems. This work introduces a cost-effective self-consistency framework to improve vision-language model (VLM) reliability for agricultural image captioning. The proposed method employs semantic clustering, using a lightweight (80MB) pre-trained embedding model to group multiple candidate responses. It then selects the most coherent caption -- containing a diagnosis, symptoms, analysis, treatment, and prevention recommendations -- through a cosine similarity-based consensus. A practical human-in-the-loop (HITL) component is incorporated, wherein user confirmation of the crop type filters erroneous generations, ensuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
