Training Data Protection with Compositional Diffusion Models

Aditya Golatkar; Alessandro Achille; Ashwin Swaminathan; Stefano; Soatto

arXiv:2308.01937·cs.LG·October 15, 2024·2 cites

Training Data Protection with Compositional Diffusion Models

Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano, Soatto

PDF

Open Access

TL;DR

This paper presents Compartmentalized Diffusion Models (CDM), enabling training on separate data sources, composition at inference, and enhanced data protection, with minimal quality loss and improved alignment in text-to-image tasks.

Contribution

Introduction of CDMs that allow isolated training, flexible composition, and data protection in diffusion models, improving efficiency and privacy without significant performance degradation.

Findings

01

CDMs achieve within 10% FID of monolithic models on vision datasets.

02

CDMs enable 8x faster forgetting with minimal FID increase.

03

CDMs improve alignment (TIFA) by 14.33% in text-to-image generation.

Abstract

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs enable perfect selective forgetting and continual learning for large-scale diffusion models, allow serving customized models based on the user's access rights. Empirically the quality (FID) of the class-conditional CDMs (8-splits) is within 10% (on fine-grained vision datasets) of a monolithic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Mycobacterium research and diagnosis

MethodsDiffusion