LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases
Khang Nguyen Quoc, Phuong D. Dao, Luyl-Da Quach

TL;DR
LeafNet and LeafBench provide a large-scale multimodal dataset and benchmark for evaluating vision-language models in plant disease diagnosis, revealing significant performance gaps and highlighting the importance of multimodal approaches.
Contribution
Introduction of LeafNet dataset and LeafBench benchmark, enabling systematic evaluation of vision-language models in plant pathology tasks.
Findings
Binary healthy-diseased classification exceeds 90% accuracy.
Fine-grained pathogen identification remains below 65%.
Multimodal models outperform vision-only models in diagnostic tasks.
Abstract
Foundation models and vision-language pre-training have significantly advanced Vision-Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their application in domain-specific agricultural tasks, such as plant pathology, remains limited due to the lack of large-scale, comprehensive multimodal image--text datasets and benchmarks. To address this gap, we introduce LeafNet, a comprehensive multimodal dataset, and LeafBench, a visual question-answering benchmark developed to systematically evaluate the capabilities of VLMs in understanding plant diseases. The dataset comprises 186,000 leaf digital images spanning 97 disease classes, paired with metadata, generating 13,950 question-answer pairs spanning six critical agricultural tasks. The questions assess various aspects of plant pathology understanding, including visual symptom recognition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Advanced Neural Network Applications · Multimodal Machine Learning Applications
