Sampling Binary Data by Denoising through Score Functions

Francis Bach; Saeed Saremi

arXiv:2502.00557·stat.ML·February 4, 2025·2 cites

Sampling Binary Data by Denoising through Score Functions

Francis Bach, Saeed Saremi

PDF

Open Access

TL;DR

This paper extends score-based generative modeling to binary data by using Bernoulli noise, deriving a denoising formula, and proposing a Langevin-like sampler, enabling effective sampling and denoising of binary distributions.

Contribution

It introduces a TMF-like denoising framework for binary data with Bernoulli noise and develops a practical sampling method without continuous stochastic processes.

Findings

01

Sampling becomes easier at high Bernoulli noise levels.

02

The proposed method effectively denoises and samples binary data.

03

Theoretical analysis confirms the sampler's efficiency across noise levels.

Abstract

Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Face and Expression Recognition · Anomaly Detection Techniques and Applications