Bootstrapping Diffusion: Diffusion Model Training Leveraging Partial and Corrupted Data

Xudong Ma

arXiv:2505.11825·cs.CV·May 20, 2025

Bootstrapping Diffusion: Diffusion Model Training Leveraging Partial and Corrupted Data

Xudong Ma

PDF

Open Access

TL;DR

This paper explores training diffusion models using partial and corrupted data, proposing a residual score function approach with theoretical guarantees for improved data efficiency and generalization.

Contribution

It introduces a novel method for training diffusion models with partial data views, supported by theoretical analysis and error bounds.

Findings

01

The residual score function approach reduces generalization error.

02

Training separate models per view improves data utilization.

03

The method achieves near first-order optimal data efficiency.

Abstract

Training diffusion models requires large datasets. However, acquiring large volumes of high-quality data can be challenging, for example, collecting large numbers of high-resolution images and long videos. On the other hand, there are many complementary data that are usually considered corrupted or partial, such as low-resolution images and short videos. Other examples of corrupted data include videos that contain subtitles, watermarks, and logos. In this study, we investigate the theoretical problem of whether the above partial data can be utilized to train conventional diffusion models. Motivated by our theoretical analysis in this study, we propose a straightforward approach of training diffusion models utilizing partial data views, where we consider each form of complementary data as a view of conventional data. Our proposed approach first trains one separate diffusion model for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis

MethodsDiffusion