A Vision-Language Foundation Model for Leaf Disease Identification

Khang Nguyen Quoc; Lan Le Thi Thu; Luyl-Da Quach

arXiv:2505.07019·cs.CV·October 28, 2025

A Vision-Language Foundation Model for Leaf Disease Identification

Khang Nguyen Quoc, Lan Le Thi Thu, Luyl-Da Quach

PDF

1 Repo

TL;DR

SCOLD is a novel vision-language model specifically designed for leaf disease identification, leveraging a large domain-specific dataset and soft-target contrastive learning to improve generalization and robustness in agricultural diagnostics.

Contribution

This work introduces SCOLD, a domain-specific, context-aware vision-language foundation model trained on a large plant leaf dataset, with soft-target contrastive learning to enhance fine-grained classification.

Findings

01

SCOLD outperforms existing models on zero-shot and few-shot tasks.

02

SCOLD demonstrates superior robustness and generalization in plant disease identification.

03

Ablation studies confirm the effectiveness of soft-target contrastive learning.

Abstract

Leaf disease identification plays a pivotal role in smart agriculture. However, many existing studies still struggle to integrate image and textual modalities to compensate for each other's limitations. Furthermore, many of these approaches rely on pretraining with constrained datasets such as ImageNet, which lack domain-specific information. We propose SCOLD (Soft-target COntrastive learning for Leaf Disease identification), a context-aware vision-language foundation model tailored to address these challenges for agricultural tasks. SCOLD is developed using a diverse corpus of plant leaf images and corresponding symptom descriptions, comprising over 186,000 image-caption pairs aligned with 97 unique concepts. Through task-agnostic pretraining, SCOLD leverages contextual soft targets to mitigate overconfidence in contrastive learning by smoothing labels, thereby improving model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/enalis/scold
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning