ViewDelta: Scaling Scene Change Detection through Text-Conditioning

Subin Varghese; Joshua Gao; Vedhus Hoskere

arXiv:2412.07612·cs.CV·August 14, 2025

ViewDelta: Scaling Scene Change Detection through Text-Conditioning

Subin Varghese, Joshua Gao, Vedhus Hoskere

PDF

Open Access 1 Models 1 Datasets

TL;DR

ViewDelta introduces a text-conditioned framework for scene change detection, enabling a single model to effectively generalize across diverse datasets and applications by precisely defining relevant changes with natural language prompts.

Contribution

The paper presents ViewDelta, a novel text-conditioned change detection model and the CSeg dataset, allowing joint training across multiple datasets for improved generalization.

Findings

01

Single ViewDelta model outperforms dataset-specific models.

02

Text conditioning enhances generalization across datasets.

03

Large-scale synthetic dataset facilitates training and evaluation.

Abstract

We introduce a generalized framework for Scene Change Detection (SCD) that addresses the core ambiguity of distinguishing "relevant" from "nuisance" changes, enabling effective joint training of a single model across diverse domains and applications. Existing methods struggle to generalize due to differences in dataset labeling, where changes such as vegetation growth or lane marking alterations may be labeled as relevant in one dataset and irrelevant in another. To resolve this ambiguity, we propose ViewDelta, a text conditioned change detection framework that uses natural language prompts to define relevant changes precisely, such as a single attribute, a specific set of classes, or all observable differences. To facilitate training in this paradigm, we release the Conditional Change Segmentation dataset (CSeg), the first large-scale synthetic dataset for text conditioned SCD,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
hoskerelab/ViewDelta
model

Datasets

hoskerelab/CSeg
dataset· 43 dl
43 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques