UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era
Ziqiang Zhu, Bowei Yang

TL;DR
UniVCD introduces an unsupervised, open-vocabulary change detection method leveraging frozen vision foundation models, achieving high accuracy across diverse scenes without labeled data.
Contribution
It presents a novel unsupervised approach using SAM2 and CLIP with a lightweight alignment module for open-vocabulary change detection.
Findings
Achieves state-of-the-art performance on public benchmarks.
Effectively detects category-agnostic changes without labeled data.
Surpasses existing open-vocabulary change detection methods.
Abstract
Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring. Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs; they typically focus on a few predefined categories and generalize poorly to diverse scenes. With the rise of vision foundation models such as SAM2 and CLIP, new opportunities have emerged to relax these constraints. We propose Unified Open-Vocabulary Change Detection (UniVCD), an unsupervised, open-vocabulary change detection method built on frozen SAM2 and CLIP. UniVCD detects category-agnostic changes across diverse scenes and imaging geometries without any labeled data or paired change images. A lightweight feature alignment module is introduced to bridge the spatially detailed representations from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Remote Sensing in Agriculture · Geographic Information Systems Studies
