LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases

Khang Nguyen Quoc; Phuong D. Dao; Luyl-Da Quach

arXiv:2602.13662·cs.CV·February 23, 2026

LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases

Khang Nguyen Quoc, Phuong D. Dao, Luyl-Da Quach

PDF

Open Access 1 Datasets

TL;DR

LeafNet and LeafBench provide a large-scale multimodal dataset and benchmark for evaluating vision-language models in plant disease diagnosis, revealing significant performance gaps and highlighting the importance of multimodal approaches.

Contribution

Introduction of LeafNet dataset and LeafBench benchmark, enabling systematic evaluation of vision-language models in plant pathology tasks.

Findings

01

Binary healthy-diseased classification exceeds 90% accuracy.

02

Fine-grained pathogen identification remains below 65%.

03

Multimodal models outperform vision-only models in diagnostic tasks.

Abstract

Foundation models and vision-language pre-training have significantly advanced Vision-Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their application in domain-specific agricultural tasks, such as plant pathology, remains limited due to the lack of large-scale, comprehensive multimodal image--text datasets and benchmarks. To address this gap, we introduce LeafNet, a comprehensive multimodal dataset, and LeafBench, a visual question-answering benchmark developed to systematically evaluate the capabilities of VLMs in understanding plant diseases. The dataset comprises 186,000 leaf digital images spanning 97 disease classes, paired with metadata, generating 13,950 question-answer pairs spanning six critical agricultural tasks. The questions assess various aspects of plant pathology understanding, including visual symptom recognition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

enalis/LeafNet
dataset· 692 dl
692 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Advanced Neural Network Applications · Multimodal Machine Learning Applications