Taking a Big Step: Large Learning Rates in Denoising Score Matching   Prevent Memorization

Yu-Han Wu; Pierre Marion; G\'erard Biau; Claire Boyer

arXiv:2502.03435·stat.ML·May 7, 2025

Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization

Yu-Han Wu, Pierre Marion, G\'erard Biau, Claire Boyer

PDF

Open Access

TL;DR

This paper reveals that large learning rates in denoising score matching induce an implicit regularization effect, preventing neural networks from memorizing training data and improving generative model robustness.

Contribution

It uncovers an implicit regularization mechanism driven by large learning rates that mitigates memorization in denoising score matching models.

Findings

01

Large learning rates lead to high irregularity in the empirical optimal score.

02

Neural networks trained with large learning rates cannot converge to overly memorizing solutions.

03

Experiments confirm the regularization effect of large learning rates beyond one-dimensional data.

Abstract

Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples replicate the training data. Yet, in practice, only a moderate degree of memorization is observed, even without explicit regularization. In this paper, we investigate this phenomenon by uncovering an implicit regularization mechanism driven by large learning rates. Specifically, we show that in the small-noise regime, the empirical optimal score exhibits high irregularity. We then prove that, when trained by stochastic gradient descent with a large enough learning rate, neural networks cannot stably converge to a local minimum with arbitrarily small excess risk. Consequently, the learned score cannot be arbitrarily close to the empirical optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks