Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan

TL;DR
This paper investigates the corruption stage in few-shot fine-tuning of diffusion models, models this phenomenon theoretically, and proposes a Bayesian neural network approach to mitigate corruption and enhance image quality.
Contribution
The paper introduces a theoretical model of the corruption stage in diffusion model fine-tuning and applies Bayesian neural networks to mitigate this issue without extra inference costs.
Findings
Bayesian neural networks significantly reduce corruption in generated images.
The method improves fidelity, quality, and diversity in object- and subject-driven tasks.
The approach is compatible with existing fine-tuning methods and requires no additional inference cost.
Abstract
Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference…
Peer Reviews
Decision·Submitted to ICLR 2025
- This work tries to analyze the foundational diffusion model’s fine-tuning process. - The authors incorporate the user study to support the effectiveness.
**W1: Observed Issues with Incomplete Linkages** 1. **W1-1 Incomplete Rationale for “Finding a Sample to Minimize the Error Term”** - In line 229, the statement, *“pre-trained diffusion model finds a sample $( x^\star )$ to minimize the error term”*, appears potentially misleading. First, diffusion models do not involve an explicit search mechanism; the phrase *“finding a sample”* might imply a deliberate search process that does not actually occur within the diffusion framework. - Secon
- This work identifies and thoroughly investigates the "corruption stage" in few-shot fine-tuning of DMs, and it presents an innovative solution using BNNs to expand the learned distribution. - The rigorous experimental design compares different fine-tuning methods across object-driven and subject-driven tasks. Multiple quantitative metrics (e.g., Clip-T, Dino, Clip-I) validate the effectiveness of the proposed method, and user studies support these quantitative results. - The paper follows a cl
1. Most experiments focus on specific datasets like DreamBooth and the Stable Diffusion v1.5 model without testing other diffusion models or more advanced Stable Diffusion versions, such as SDXL, SD-v3/3.5. I understand that the authors may be trying to solve a fundamental problem, but whether this problem exists on more advanced models still needs to be proven. 2. While BNNs offer significant performance improvements, the paper lacks a detailed analysis of computational costs, such as memory u
1. The use of BNNs to address noise and degradation in diffusion models’ fine-tuning is innovative, providing new insights and effective mitigation strategies for this issue. 2. The paper provides a robust theoretical foundation and demonstrates the effectiveness of BNNs through various empirical tests, showcasing the improvement in image quality and diversity. 3. BNNs integrate smoothly with existing fine-tuning frameworks, and by not increasing inference costs, they retain practical relevanc
1. Although the study shows reasonable results within the specific task context, its testing is limited to the dataset used in DreamBooth, which is too small in scope. While the authors have experimented with fine-tuning using 1, 2, and 6 images, it remains unclear whether the observed phenomena ("corruption stage") would still occur when the dataset size is increased to 1000 images or when batch size is adjusted. Given approaches like ControlNet and IP-adapter, how can it be demonstrated that t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWelding Techniques and Residual Stresses · Energy, Environment, and Transportation Policies · Nuclear reactor physics and engineering
MethodsDiffusion · Variational Inference
