CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement
Wentao Zhang, Tao Fang, Lina Lu, Lifei Wang, Weihe Zhong

TL;DR
The paper introduces CPJ, a training-free, few-shot framework using large vision-language models and LLM-based refinement to improve explainable crop disease diagnosis through structured captions and dual-answer VQA, achieving significant performance gains.
Contribution
CPJ is a novel, training-free framework that enhances agricultural pest diagnosis with structured captions and LLM-based refinement, eliminating the need for costly fine-tuning.
Findings
Significant performance improvements in disease classification (+22.7 pp) and QA score (+19.5 points).
Effective use of multi-angle captions refined via LLM as-judge module.
Provides transparent, evidence-based reasoning for agricultural diagnosis.
Abstract
Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tuning and perform poorly under domain shifts. We propose Caption--Prompt--Judge (CPJ), a training-free few-shot framework that enhances Agri-Pest VQA through structured, interpretable image captions. CPJ employs large vision-language models to generate multi-angle captions, refined iteratively via an LLM-as-Judge module, which then inform a dual-answer VQA process for both recognition and management responses. Evaluated on CDDMBench, CPJ significantly improves performance: using GPT-5-mini captions, GPT-5-Nano achieves \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. The framework provides transparent, evidence-based reasoning, advancing robust and explainable agricultural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
