TL;DR
Agri-CPJ introduces a training-free, explainable framework for crop pest diagnosis that uses caption refinement and an LLM-based judge to improve accuracy and interpretability in agricultural diagnostics.
Contribution
It presents a novel training-free, multi-modal framework combining caption refinement and LLM judgment for improved pest diagnosis and explainability.
Findings
Caption refinement significantly improves downstream accuracy.
Pairing GPT-5-Nano with GPT-5-mini captions yields +22.7 pp in disease classification.
Achieved high accuracy on AgMMU-MCQs, comparable to open-source models.
Abstract
Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, before any diagnostic question is answered. Two candidate responses are then generated from complementary viewpoints, and an LLM judge selects the stronger one based on domain-specific criteria. Caption refinement is the component with the largest individual impact: ablations confirm that skipping it consistently degrades downstream accuracy across both models tested. On CDDMBench, pairing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
