CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis

Abderrahmene Boudiaf; Sajd Javed

arXiv:2605.03259·cs.CV·May 6, 2026

CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis

Abderrahmene Boudiaf, Sajd Javed

PDF

1 Repo 1 Models

TL;DR

CropVLM is a domain-adapted vision-language model designed for open-set crop analysis, enabling scalable, zero-shot plant phenotyping and detection without extensive species-specific training.

Contribution

We introduce CropVLM with domain-specific semantic alignment and HOS-Net for open-set crop detection, advancing scalable phenotyping in agriculture.

Findings

01

Achieves 72.51% zero-shot classification accuracy, outperforming baselines.

02

Demonstrates superior zero-shot detection with 49.17 AP50 on CVTCropDet.

03

Outperforms existing methods on tropical fruit species detection.

Abstract

High-throughput plant phenotyping, the quantitative measurement of observable plant traits, is critical for modern breeding but remains constrained by a "phenotyping bottleneck," where manual data collection is labor-intensive and prone to observer bias. Conventional closed-set computer vision systems fail to address this challenge, as they require extensive species-specific annotation and lack the flexibility to handle diverse breeding populations. To bridge this gap, we present CropVLM, a Vision-Language Model (VLM) adapted for the agricultural domain via Domain-Specific Semantic Alignment (DSSA). Trained on 52,987 manually selected image-caption pairs covering 37 species in natural field conditions, CropVLM effectively maps agronomic terminology to fine-grained visual features. We further introduce the Hybrid Open-Set Localization Network (HOS-Net), an architecture that integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boudiafA/CropVLM
github

Models

🤗
boudiafA/CropVLM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.