Sampling Binary Data by Denoising through Score Functions
Francis Bach, Saeed Saremi

TL;DR
This paper extends score-based generative modeling to binary data by using Bernoulli noise, deriving a denoising formula, and proposing a Langevin-like sampler, enabling effective sampling and denoising of binary distributions.
Contribution
It introduces a TMF-like denoising framework for binary data with Bernoulli noise and develops a practical sampling method without continuous stochastic processes.
Findings
Sampling becomes easier at high Bernoulli noise levels.
The proposed method effectively denoises and samples binary data.
Theoretical analysis confirms the sampler's efficiency across noise levels.
Abstract
Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Face and Expression Recognition · Anomaly Detection Techniques and Applications
