TL;DR
This paper introduces a new farmland semantic change detection benchmark and a collaborative model combining a small visual model with a large vision-language model, achieving high accuracy and robustness.
Contribution
It presents a large-scale farmland change detection dataset and a novel collaborative framework integrating a small dense change model with a large CLIP-based model for improved detection.
Findings
Achieved 97.63% F1 score on HZNU-FCD benchmark.
Improved F1 by 10.19 percentage points over ChangeCLIP-ViT.
Demonstrated strong generalization on multiple datasets.
Abstract
Farmland Semantic Change Detection (SCD) is essential for cultivated land protection, yet existing benchmarks and models remain insufficient for fine-grained farmland conversion monitoring. Current datasets often lack dedicated "from-to" annotations, while visual change detection models are easily disturbed by phenology-induced pseudo-changes caused by crop rotation, seasonal variation, and illumination differences. To address these challenges, we construct HZNU-FCD, a large-scale fine-grained farmland SCD benchmark with a unified five-class farmland-to-non-farmland annotation protocol. It contains 4,588 bitemporal image pairs with pixel-level labels for practical farmland protection. Based on this benchmark, we propose a large-small collaborative SCD framework that integrates a task-driven small visual model with a frozen large vision-language model. The small model, Fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
