LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

James Flora; Kowshik Thopalli; Akshay R. Kulkarni; Weng-Keen Wong; Shusen Liu

arXiv:2605.00899·cs.CV·May 5, 2026

LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

James Flora, Kowshik Thopalli, Akshay R. Kulkarni, Weng-Keen Wong, Shusen Liu

PDF

TL;DR

LatentDiff is a scalable, efficient framework for semantic dataset comparison in the latent space of vision encoders, outperforming caption-based methods especially with small distribution shifts.

Contribution

It introduces LatentDiff, combining autoencoder divergence testing with density ratio estimation, and a new benchmark Noisy-Diff for realistic dataset shifts.

Findings

01

LatentDiff achieves superior accuracy over existing methods.

02

It remains robust with very small fractions of semantically different images.

03

LatentDiff is computationally more efficient than caption-based alternatives.

Abstract

We present LatentDiff, a scalable framework for semantic dataset comparison that operates directly in the latent space of pretrained vision encoders. By combining sparse autoencoder-based divergence testing with density ratio estimation, LatentDiff identifies interpretable semantic differences between datasets at a fraction of the computational cost of caption-based alternatives. We also introduce Noisy-Diff, a benchmark capturing realistic sparse distribution shifts that cause existing methods to struggle. Experiments demonstrate that LatentDiff achieves superior accuracy while remaining robust to settings where an extremely small fraction of images (from 5% to <1% ) differ semantically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.