AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
Umair Nawaz, Muhammad Awais, Hanan Gani, Muzammal Naseer, Fahad Khan,, Salman Khan, Rao Muhammad Anwer

TL;DR
AgriCLIP is a specialized vision-language model for agriculture and livestock that leverages a large domain-specific dataset and a combined training approach to improve zero-shot performance on related tasks.
Contribution
The paper introduces a new large-scale agricultural dataset ALive and a training pipeline combining contrastive and self-supervised learning for domain-specific vision-language modeling.
Findings
Achieved 7.8% improvement in zero-shot classification accuracy over standard CLIP.
Demonstrated effectiveness across 20 downstream agricultural and livestock tasks.
Provided accessible dataset and code for future research.
Abstract
Capitalizing on vast amount of image-text data, large-scale vision-language pre-training has demonstrated remarkable zero-shot capabilities and has been utilized in several applications. However, models trained on general everyday web-crawled data often exhibit sub-optimal performance for specialized domains, likely due to domain shift. Recent works have tackled this problem for some domains (e.g., healthcare) by constructing domain-specialized image-text data. However, constructing a dedicated large-scale image-text dataset for sustainable area of agriculture and livestock is still open to research. Further, this domain desires fine-grained feature learning due to the subtle nature of the downstream tasks (e.g, nutrient deficiency detection, livestock breed classification). To address this we present AgriCLIP, a vision-language foundational model dedicated to the domain of agriculture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCooperative Studies and Economics
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
