Diffusion Models as Data Mining Tools

Ioannis Siglidis; Aleksander Holynski; Alexei A. Efros; Mathieu Aubry,; Shiry Ginosar

arXiv:2408.02752·cs.CV·August 7, 2024

Diffusion Models as Data Mining Tools

Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry,, Shiry Ginosar

PDF

Open Access

TL;DR

This paper introduces a novel method using fine-tuned diffusion models to perform visual data mining by assessing the typicality of visual elements across diverse datasets, enabling scalable and versatile analysis.

Contribution

It presents a new approach that leverages generative diffusion models for data mining, capable of handling diverse datasets without explicit pairwise comparisons.

Findings

01

Effective typicality measure for visual data

02

Scalable analysis across multiple datasets

03

Ability to translate and analyze visual changes

Abstract

This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. Our insight is that since contemporary generative models learn an accurate representation of their training data, we can use them to summarize the data by mining for visual patterns. Concretely, we show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure on that dataset. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease. This analysis-by-synthesis approach to data mining has two key advantages. First, it scales much better than traditional correspondence-based approaches since it does not require explicitly comparing all pairs of visual elements. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and Computational Modeling

MethodsDiffusion · Focus