LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images
James Flora, Kowshik Thopalli, Akshay R. Kulkarni, Weng-Keen Wong, Shusen Liu

TL;DR
LatentDiff is a scalable, efficient framework for semantic dataset comparison in the latent space of vision encoders, outperforming caption-based methods especially with small distribution shifts.
Contribution
It introduces LatentDiff, combining autoencoder divergence testing with density ratio estimation, and a new benchmark Noisy-Diff for realistic dataset shifts.
Findings
LatentDiff achieves superior accuracy over existing methods.
It remains robust with very small fractions of semantically different images.
LatentDiff is computationally more efficient than caption-based alternatives.
Abstract
We present LatentDiff, a scalable framework for semantic dataset comparison that operates directly in the latent space of pretrained vision encoders. By combining sparse autoencoder-based divergence testing with density ratio estimation, LatentDiff identifies interpretable semantic differences between datasets at a fraction of the computational cost of caption-based alternatives. We also introduce Noisy-Diff, a benchmark capturing realistic sparse distribution shifts that cause existing methods to struggle. Experiments demonstrate that LatentDiff achieves superior accuracy while remaining robust to settings where an extremely small fraction of images (from 5% to <1% ) differ semantically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
