Pruning then Reweighting: Towards Data-Efficient Training of Diffusion   Models

Yize Li; Yihua Zhang; Sijia Liu; Xue Lin

arXiv:2409.19128·cs.CV·October 3, 2024

Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models

Yize Li, Yihua Zhang, Sijia Liu, Xue Lin

PDF

Open Access 1 Repo

TL;DR

This paper proposes a data-efficient training method for diffusion models by combining dataset pruning with class-wise reweighting, resulting in faster training with comparable image synthesis quality.

Contribution

It introduces a novel data selection and reweighting approach for diffusion models, improving training efficiency without sacrificing generation quality.

Findings

01

Achieves 2.34x to 8.32x speed-up on CIFAR-10 with comparable quality

02

Extends the method to latent diffusion models like SD and MDT

03

Demonstrates effectiveness on ImageNet dataset

Abstract

Despite the remarkable generation capabilities of Diffusion Models (DMs), conducting training and inference remains computationally expensive. Previous works have been devoted to accelerating diffusion sampling, but achieving data-efficient diffusion training has often been overlooked. In this work, we investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training, where data features are encoded by a surrogate model, and a score criterion is then applied to select the coreset. To further improve the generation performance, we employ a class-wise reweighting approach, which derives class weights through distributionally robust optimization (DRO) over a pre-trained reference DM.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yeez-lee/data-selection-and-reweighting-for-diffusion-models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding