ViewDelta: Scaling Scene Change Detection through Text-Conditioning
Subin Varghese, Joshua Gao, Vedhus Hoskere

TL;DR
ViewDelta introduces a text-conditioned framework for scene change detection, enabling a single model to effectively generalize across diverse datasets and applications by precisely defining relevant changes with natural language prompts.
Contribution
The paper presents ViewDelta, a novel text-conditioned change detection model and the CSeg dataset, allowing joint training across multiple datasets for improved generalization.
Findings
Single ViewDelta model outperforms dataset-specific models.
Text conditioning enhances generalization across datasets.
Large-scale synthetic dataset facilitates training and evaluation.
Abstract
We introduce a generalized framework for Scene Change Detection (SCD) that addresses the core ambiguity of distinguishing "relevant" from "nuisance" changes, enabling effective joint training of a single model across diverse domains and applications. Existing methods struggle to generalize due to differences in dataset labeling, where changes such as vegetation growth or lane marking alterations may be labeled as relevant in one dataset and irrelevant in another. To resolve this ambiguity, we propose ViewDelta, a text conditioned change detection framework that uses natural language prompts to define relevant changes precisely, such as a single attribute, a specific set of classes, or all observable differences. To facilitate training in this paradigm, we release the Conditional Change Segmentation dataset (CSeg), the first large-scale synthetic dataset for text conditioned SCD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
