TL;DR
SCOLD is a novel vision-language model specifically designed for leaf disease identification, leveraging a large domain-specific dataset and soft-target contrastive learning to improve generalization and robustness in agricultural diagnostics.
Contribution
This work introduces SCOLD, a domain-specific, context-aware vision-language foundation model trained on a large plant leaf dataset, with soft-target contrastive learning to enhance fine-grained classification.
Findings
SCOLD outperforms existing models on zero-shot and few-shot tasks.
SCOLD demonstrates superior robustness and generalization in plant disease identification.
Ablation studies confirm the effectiveness of soft-target contrastive learning.
Abstract
Leaf disease identification plays a pivotal role in smart agriculture. However, many existing studies still struggle to integrate image and textual modalities to compensate for each other's limitations. Furthermore, many of these approaches rely on pretraining with constrained datasets such as ImageNet, which lack domain-specific information. We propose SCOLD (Soft-target COntrastive learning for Leaf Disease identification), a context-aware vision-language foundation model tailored to address these challenges for agricultural tasks. SCOLD is developed using a diverse corpus of plant leaf images and corresponding symptom descriptions, comprising over 186,000 image-caption pairs aligned with 97 unique concepts. Through task-agnostic pretraining, SCOLD leverages contextual soft targets to mitigate overconfidence in contrastive learning by smoothing labels, thereby improving model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
