Vision-Language Semantic Grounding for Multi-Domain Crop-Weed Segmentation

Nazia Hossain; Xintong Jiang; Yu Tian; Philippe Seguin; O. Grant Clark; Shangpeng Sun

arXiv:2602.23677·cs.CV·March 2, 2026

Vision-Language Semantic Grounding for Multi-Domain Crop-Weed Segmentation

Nazia Hossain, Xintong Jiang, Yu Tian, Philippe Seguin, O. Grant Clark, Shangpeng Sun

PDF

Open Access

TL;DR

This paper introduces VL-WS, a vision-language grounded segmentation framework that generalizes across diverse agricultural environments by leveraging semantic alignment and domain-invariant features, significantly improving weed segmentation accuracy.

Contribution

The novel VL-WS framework combines CLIP embeddings with spatial features using FiLM layers, enabling cross-domain generalization and label-efficient weed segmentation in precision agriculture.

Findings

01

Achieves 91.64% mean Dice score on benchmark datasets.

02

Outperforms CNN baseline by 4.98% in Dice score.

03

Improves weed class segmentation, reaching 80.45% Dice score.

Abstract

Fine-grained crop-weed segmentation is essential for enabling targeted herbicide application in precision agriculture. However, existing deep learning models struggle to generalize across heterogeneous agricultural environments due to reliance on dataset-specific visual features. We propose Vision-Language Weed Segmentation (VL-WS), a novel framework that addresses this limitation by grounding pixel-level segmentation in semantically aligned, domain-invariant representations. Our architecture employs a dual-encoder design, where frozen Contrastive Language-Image Pretraining (CLIP) embeddings and task-specific spatial features are fused and modulated via Feature-wise Linear Modulation (FiLM) layers conditioned on natural language captions. This design enables image level textual descriptions to guide channel-wise feature refinement while preserving fine-grained spatial localization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Advanced Neural Network Applications · Remote Sensing in Agriculture