Lattice Random Walk Discretisations of Stochastic Differential Equations
Samuel Duffield, Maxwell Aifer, Denis Melanson, Zach Belateche, Patrick J. Coles

TL;DR
This paper presents a novel lattice random walk discretisation method for SDEs that simplifies computations to binary or ternary steps, enabling efficient stochastic computing and robust handling of complex drifts.
Contribution
The paper introduces a new lattice-based discretisation scheme for SDEs that reduces complexity and enhances compatibility with stochastic computing architectures.
Findings
Proves weak convergence of the discretisation scheme
Demonstrates robustness to quantisation errors
Shows effectiveness on various SDEs including diffusion models
Abstract
We introduce a lattice random walk discretisation scheme for stochastic differential equations (SDEs) that samples binary or ternary increments at each step, suppressing complex drift and diffusion computations to simple 1 or 2 bit random values. This approach is a significant departure from traditional floating point discretisations and offers several advantages; including compatibility with stochastic computing architectures that avoid floating-point arithmetic in place of directly manipulating the underlying probability distribution of a bitstream, elimination of Gaussian sampling requirements, robustness to quantisation errors, and handling of non-Lipschitz drifts. We prove weak convergence and demonstrate the advantages through experiments on various SDEs, including state-of-the-art diffusion models.
| Work |
|
|
|||||
|---|---|---|---|---|---|---|---|
| Floating Point EM | |||||||
| Stochastic LRW |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering · Markov Chains and Monte Carlo Methods · Model Reduction and Neural Networks
Lattice Random Walk Discretisations of Stochastic Differential Equations
Samuel Duffield
Maxwell Aifer
Denis Melanson
Zach Belateche
Patrick J. Coles
Normal Computing Corporation, New York, New York, USA
Abstract
We introduce a lattice random walk discretisation scheme for stochastic differential equations (SDEs) that samples binary or ternary increments at each step, suppressing complex drift and diffusion computations to simple 1 or 2 bit random values. This approach is a significant departure from traditional floating point discretisations and offers several advantages; including compatibility with stochastic computing architectures that avoid floating-point arithmetic in place of directly manipulating the underlying probability distribution of a bitstream, elimination of Gaussian sampling requirements, robustness to quantisation errors, and handling of non-Lipschitz drifts. We prove weak convergence and demonstrate the advantages through experiments on various SDEs, including state-of-the-art diffusion models.
I Introduction
Stochastic differential equations (SDEs) are a powerful tool for modelling a wide range of phenomena across physics, finance, biology, and machine learning. In recent years, SDEs have become particularly crucial in modern machine learning, where they form the mathematical foundation of diffusion models—the breakthrough technology behind state-of-the-art image generation systems [56, 17]. This is in addition to established fields such as molecular dynamics [37] and Bayesian inference [27, 14] where the simulation of complex SDEs is the key workhorse.
Despite their widespread use, simulating SDEs presents significant computational challenges [26, 7]. SDEs are inherently continuous-time objects, which immediately creates difficulties for numerical implementation since digital computers can only perform discrete operations. Unlike ordinary differential equations, SDEs are augmented with continuous-time random noise. This stochastic nature means that each simulation path is different, requiring multiple realizations to estimate statistical properties. Moreover, the interplay between deterministic drift and stochastic noise diffusion terms creates complex dynamics that can be sensitive to discretisation choices, particularly in high-dimensional systems [54]. Additionally, SDE simulation is an inherently sequential process and therefore receives little benefit from the parallelization of modern GPU hardware.
The most widely used numerical method, Euler–Maruyama, approximates the continuous-time SDE by discretizing both the deterministic drift and stochastic diffusion terms. While conceptually simple and easy to implement, Euler–Maruyama suffers from several limitations: it assumes infinite numerical precision, making it vulnerable to quantisation errors in hardware implementations; and it can fail catastrophically for non-Lipschitz drift functions, which are common in modern applications [24, 32]. It also is not perfectly suited to digital hardware due to the assumption of infinite precision and requirement for Gaussian sampling.
Specifically, our proposed lattice random walk (LRW) scheme for discretising SDEs, offers several advantages over existing methods (e.g., Euler–Maruyama), which include but may not be limited to:
- •
Enabling stochastic computing for SDE simulation, unlocking potentially massive speedups on bespoke noise-based digital hardware.
- •
No Gaussian sampling required, which removes a non-trivial subroutine that involves transcendental functions and assumes infinite precision.
- •
Robustness to quantisation, the generated samples lie on an integer lattice and therefore their quantisation is directly incorporated into the weak error analysis as opposed to existing methods. Further the increment at each step is binary or ternary and therefore significantly more robust to error in the underlying drift and diffusion functions (such as that arising from quantisation).
- •
Handling non-Lipschitz drifts, as with the discretisation scheme of [32], but unlike Euler–Maruyama which is known to perform poorly for non-globally Lipschitz drifts [23].
The LRW discretization as well as Euler-Maruyama and a continuous-time trajectory are visualized on a toy two-dimensional distribution in Fig. 1, where one can clearly see the drastic difference in approaches between assuming infinite precision and traversing a lattice.
The paper is structured as follows: In Section II we introduce the lattice random walk discretisation, establish its (weak) convergence and then describe related work in Section III. In Section IV we expand on each of the aforementioned advantages of LRW and discuss its implementation. In Section V, we present experiments demonstrating the advantages of LRW as well as its scalability to state-of-the-art image diffusion models. In Section VI we discuss the results, limitations as well as future work and potential for impact.
II Lattice Random Walk
We consider methods for discretizing stochastic differential equations (SDEs), which in full generality have the form
[TABLE]
where is denoted the drift vector, is denoted the diffusion matrix and is a standard multivariate Brownian motion.
The most widely used numerical method for discretizing SDEs is the Euler–Maruyama (EM) scheme, which approximates the continuous-time SDE (1) by discretizing both the deterministic drift and stochastic diffusion terms. The scheme takes the form
[TABLE]
for temporal stepsize .
In stark contrast, our LRW scheme discretizes the SDE (1) by sampling a ternary-valued increment at each step. Specifically, we consider the following multivariate discrete-time ternary-valued update:
[TABLE]
where indexes the coordinate of a -dimensional vector. Here, the probability vectors depend on the SDE (1) and are defined as
[TABLE]
Here, and from now on, we assume to be diagonal and behave as a vector with powers and multiplication understood elementwise, we elaborate on the restrictiveness of this assumption in Section II.3. The parameter is a spatial stepsize vector, whereas is a temporal stepsize scalar (as in Euler-Maruyama).
The motivation behind the probabilities (4) becomes apparent upon the calculation of the first and second increment moments
[TABLE]
Intuitively, we would then expect the discrete scheme to converge to the true SDE (1) as . This can be formalised with the notions of weak and strong convergence [36, 51].
II.1 Weak Convergence
Weak convergence measures how well a discretisation scheme approximates the statistical properties of the true SDE solution. Unlike strong convergence, which concerns individual sample paths, weak convergence focuses on the accuracy of expectations of test functions evaluated at the discretized solution. This in some sense is a necessary requirement for an SDE discretisation scheme as it ensures expectations with respect to the true SDE are recovered as . Specifically, a method with weak order guarantees that expectations converge to the true SDE solution at a rate of as the temporal stepsize , ensuring that the discretisation faithfully recovers the underlying continuous-time stochastic process.
Theorem 1** (Weak convergence of the LRW discretisation).**
Consider the SDE (1) with drift function and diagonal diffusion matrix that are sufficiently smooth. Let be a test function with bounded derivatives. Then the LRW discretisation (3) with spatial stepsize has weak order 1, i.e.,
[TABLE]
where is the discretized solution at time and is the true SDE solution.
Proof.
Provided in Appendix A. ∎
Here we use the notation to denote a function that is bounded above and below by constant multiples of the argument.
The weak order 1 result ensures that the LRW discretisation recovers the true SDE solution in the limit of small temporal stepsize and its rate matches that of Euler-Maruyama. There are of course higher order methods that can achieve higher weak order convergence rates, such the general class of stochastic Runge-Kutta methods [11], see Table 1. Often these are more complex to implement (such as requiring many queries to the drift function per iteration) or require some restriction on the general SDE (1).
II.2 Selecting
In the Euler-Maruyama discretisation (2), there is a single tuning parameter in ; for LRW we also have (which behaves in the same way as for EM controlling the level of temporal discretisation error and the number of iterations needed to reach a specified time ). However, in addition, we have the spatial stepsize which we now give some intuition on this new tuning parameter’s behaviour and specification.
For a valid ternary distribution, we have two constraints
[TABLE]
where we have used (4) and represents elementwise absolute value with the inequality required across all dimensions.
Theorem 2** (Allowable range for ).**
For the probabilities in (4) to represent a valid ternary distribution must satisfy
[TABLE]
for specified , and .
Proof.
Follows from basic manipulations of (6). ∎
Theorem 3** (Feasibility condition for ).**
For the range in (7) to be non-empty we must have
[TABLE]
Proof.
Follows from setting equality in (7). ∎
This feasibility condition makes the stochastic nature of the discretisation apparent. Since we need , (8) implies we also need , thus the discretisation is inherently stochastic and does not reduce to an ordinary differential equation (ODE) discretisation in the same way Euler-Maruyama does with . However, as we will see in Section II.4, we can adjust the LRW discretisation into one that is consistent for deterministic ODEs.
The first criterion we have to satisfy is the feasibility condition (8). The user will have freedom to choose and in some cases . Thus, they can either decrease or increase to ensure the feasibility condition is met. In practice, we may not have intricate knowledge of for each dimension and varying and thus the user chooses and perhaps to the best of their knowledge to ensure confidence in the feasibility condition is met. With the tradeoff being that decreasing increases the number of steps to get to a given time whilst increasing increases the noise in the SDE, which may be undesirable (if indeed it can be tuned).
Once and are set and we have confidence that the feasibility condition is met, we can turn to the selection of . As mentioned, we typically have little knowledge on the range of but more so on the range of (i.e. in the common case of fixed ). We can often ensure the lower bound of (7) is met since it only requires knowledge of and , this gives us a rule of thumb for specifying .
Rule of Thumb**.**
Setting according to
[TABLE]
for , ensures the lower bound in (7) is satisfied. Then the upper bound is also satisfied so long as the feasibility condition (8) holds.
Theorem 4** (Reduction from ternary to binary).**
For a single LRW iteration, where the feasibility condition (8) holds, then setting
[TABLE]
reduces the ternary distribution (4) to a binary distribution with . Therefore if we have constant , the ternary update with rule of thumb reduces to binary at all iterations.
Proof.
Direct from (6). ∎
We also note that, although we have described (9) as a rule of thumb for setting for a specified , it also satisfies the condition required for weak convergence in Section II.1.
In practice, we likely cannot guarantee (8) globally, particularly in the case of non-globally Lipschitz drifts, Section IV.4. In this case we can still obtain a stable discretisation by clipping the probabilities appropriately. That is, we can clip the coordinates of so that which ensures . Then clip the coordinates of such that to ensure . This clipping naturally modifies the moments (5), but as the possibility of clipping disappears, therefore does not affect weak convergence.
For drift or diffusion functions that have a large dynamic range over the course of a SDE trajectory, we can also generalize to be a function of or even so long as we still have . This is particularly useful for SDEs with time-varying diffusion functions, such as variance-exploding diffusion models [56].
II.3 Diagonal Diffusion Assumption
The LRW probabilities in (4) as written make the assumption that is diagonal. This is a strong assumption, but it covers the majority of applications including Langevin-based sampling from Bayesian posteriors [14] or molecular dynamics equilibrium distributions [37], as well as modern machine learning diffusion models [56]. We note that this diagonal diffusion limitation also features in the discretisation scheme of Ref. [32].
Additionally, general SDEs can be converted into one with a constant and identity diffusion matrix via a Lamperti transform. This is detailed in Appendix B for the simpler case when the (dense) diffusion matrix depends on but not , for the complete case see [48]. Thus the majority of SDEs with dense diffusion matrices (those that depend on but not ) can be handled easily by the LRW discretisation via the Lamperti transform and therefore we leave the general case of a space-dependent dense diffusion matrices as future work.
II.4 Lattice Random Walk for Ordinary Differential Equations
An alternative LRW discretisation that is consistent for ODEs can be achieved with a variance that is quadratic in the stepsize since then the variance decays faster than the mean as .
Specifically if the increment’s moments are modified to have the form
[TABLE]
Then for some independent of the discretisation will converge to the ODE as .
Since ODEs are deterministic, we don’t typically differentiate between strong and weak error, since they coincide. In this case, we don’t necessarily expect the above LRW discretisation to have first order error because of the required injected noise. However, for many applications, such as Bayesian sampling and inference in diffusion models in Section V.3 a version of weak error may still apply for ODEs where expectations are taken with respect to a random initial distribution . A full error analysis in this setting of ODE discretisation is left for future work.
III Related Work
Lattice random walks have a rich history in physics and mathematics, dating back to early work on the development of discrete-time Markov processes on regular lattices [18]. These traditional approaches typically model discrete stochastic systems or natural phenomena that are inherently lattice-based, such as particle diffusion in crystalline structures [49] or polymer dynamics on lattices [53]. In contrast, our LRW discretisation is fundamentally different: rather than modeling discrete systems, we use lattice random walks as a computational tool to approximate continuous-time, continuous-space SDEs that arise in diverse applications from molecular dynamics to machine learning.
SDEs represent an extremely broad class of continuous-time stochastic processes and a large variety of numerical methods exist for simulating their evolution [54]. As discussed, the Euler-Maruyama method (2) is the most widely used due to its simplicity and applicability to fully general SDEs (1). Alternative methods and their relation to LRW can be found in Table 1. Which notably includes stochastic Runge-Kutta methods [11] which use multiple steps to increase the weak convergence order as well as tamed [29] and implicit [28] variants on Euler-Maruyama for improved stability to non-globally Lipschitz drift functions.
There has also been developed stochastic discretisations which modify the Euler-Maruyama method replacing the Gaussian noise source with a binary-valued source (so-called two-point methods) or ternary-valued source (so-called three-point methods) [36]. However here the discrete noise source is only applied to the diffusion term and still requires a component of the form for the drift term which contrasts significantly with the LRW update (3) where a full iteration is binary or ternary valued.
The most closely related method to LRW is the skew-symmetric discretisation in [32], which builds on earlier work on spatially discretized stochastic processes [6]. This discretisation assumes a time-homogenous SDE and takes the form
[TABLE]
where and is a binary-valued random variable taking values in with probabilities that ensure the weak order 1 convergence rate. The skew-symmetric discretisation shares similarities with the LRW, notably that the drift computation is absorbed into the sampling of a binary-valued random variable and that they share the same weak error and diagonal diffusion assumption. However, there remains notable differences including that the skew-symmetric discretisation requires Gaussian sampling, doesn’t produce samples on a lattice, makes the assumption of time-homogeneity in the SDE, applies the diffusion computation outside of the binary sampling as well as having a significantly different (and more complex) form for the binary random variable probabilities. These differences are key in enabling implementation on a stochastic computer as well as application to time-inhomogeneous SDEs found in modern diffusion models.
IV Advantages of Lattice Random Walk
We now go into more detail on the advantages of the introduced LRW discretisation over existing methods.
IV.1 Stochastic Computing
Stochastic computing [19, 3, 21] represents an alternative paradigm to traditional deterministic digital hardware. It is based on the idea that a real number can be encoded as the underlying probability distribution of a random bitstream. This real number can then be manipulated by performing operations on the bitstream and never having to store the real number itself or apply floating point arithmetic. For this reason stochastic computing can be highly time and energy efficient. However, it suffers from the issue that many applications and pipelines require a real valued output and therefore the aggregation of the output bitstream which can nullify the speedup of the internal operations. Our lattice random walk discretisation utilizes complex underlying operations but only requires a binary or ternary random variable at each step, therefore removing the aforementioned issue of aggregation of the output and enabling stochastic computing for the key pipeline of SDE simulation.
Stochastic computing has been applied to a range of computational tasks, leveraging its error resilience and low hardware cost. Notable applications include the numerical integration of ordinary differential equations [42, 9], neural network acceleration [3, 38], and probabilistic inference [19]. Yet to our knowledge stochastic computing has not yet been applied to the simulation of general SDEs, critically due to the lack of stochastic hardware-compatible theoretical methods which we unlock in this work.
As an example of the potential efficiencies of stochastic computing, consider a stochastic (unipolar) number (that is a random bit generated from ) and a stochastic number then a new stochastic number representing the product can be computed with a simple AND gate . Similarly we have and other basic gates, see Chapter 5 in [21] (although the NAND gate is universal for boolean logic). This makes linear operations (such as matrix multiplications) in stochastic computing extremely efficient and can be applied in a single input bit to single output bit model of computation. And importantly avoid any floating point arithmetic or quantisation error beyond that of the original stochastic numbers generation.
However, non-linear operations are significantly more difficult and fundamentally require multiple stochastic bits to generate a single output bit, that is multiple bits are required to generate a single output bit for non-linear . Still, efficient and general-purpose non-linear univariate protocols have been developed including finite-state machines [39], Bernstein polynomials [52], Maclaurin expansions [50], piecewise linear approximations [43] and recently Chebyshev polynomials [35]. The Kolmogorov-Arnold representation theorem implies that any multivariate continuous function can be decomposed into the composition of matrix multiplication and elementwise non-linearities, thus robust implementations of non-linear univariate protocols combined with the linear arithmetic inherent to stochastic computing is sufficient for universal (continuous) stochastic computation.
A key limitation of stochastic computing is that it requires a conversion between the stochastic domain (bitstreams) and the conventional domain (real numbers). This conversion typically requires aggregating long bitstreams to estimate probabilities, which can nullify the computational advantages gained during the stochastic operations. This bottleneck has limited the practical application of stochastic computing. The setting of the LRW discretisation fundamentally alleviates this limitation. Unlike traditional SDE discretisation methods that require floating-point arithmetic for both drift and diffusion computations, LRW reduces each iteration to sampling a simple discrete random variable, binary or ternary (noting that a ternary random variable can be represented by two binary random variables). This output inherently avoids the issue of aggregating stochastic bitstreams back to real numbers since the required output is binary or ternary, as depicted in Figure 2.
We note that a stochastic integrator, a fundamental circuit used in stochastic computing ODE solvers [19, 42, 41] can be viewed as a ternary update. As such, the LRW reduction of SDEs to a simple ternary update serves as an analytical description of stochastic computing that fully incorporates its inherent stochasticity, thus enabling the use of randomness as a computational parameter instead of only a source of error as in stochastic computing implementations for deterministic processes [42, 9].
In Appendix F, we give a concrete stochastic computing protocol for LRW on linear SDEs, i.e. Ornstein-Uhlenbeck (OU) processes . It utilises a variant on the commonly used stochastic computing multiplexer (MUX) primitive for scaled addition [21]. By computing alias tables [57] in a fast preprocessing step, categorical sampling can be performed in time, enabling a single iteration of LRW with work and parallel time versus traditional matrix-vector product arithmetic which has work and parallel time [5]. The overall complexities required to achieve weak error are detailed in Table 2, which in both cases requires steps since both EM and LRW have weak order 1.
We note several caveats to this protocol: (i) the MUX-based approach imposes a maximal stepsize constraint (see Appendix F), (ii) the protocol is specific to linear OU processes and does not directly extend to non-linear drifts, and (iii) OU processes can alternatively be sampled exactly with preprocessing via matrix decomposition [15], though this requires significantly more complex floating-point operations whilst higher-order discretisations (see Table 1) can also improve the dependency on .
The key advantage of the stochastic computing approach is that it replaces floating-point matrix-vector products with simple bit-level operations, potentially enabling significant energy and latency improvements on specialized hardware. Notably CPUs and GPUs are designed for dense floating point arithmetic whereas the stochastic approach moves the bottleneck to repeated categorical sampling via alias tables which represents a fundamentally different pipeline ripe for specialized hardware acceleration.
IV.2 No Gaussian Sampling
Almost all existing SDE discretisation methods require Gaussian sampling with the exception of the two-point (or three-point) methods from [36]. Gaussian sampling on digital hardware represents a non-trivial operation since it does not have a native representation as randomness defined over bits. Typically, Gaussian samples are generated using a Ziggurat method [22], which is a fairly involved (yet efficient) rejection sampling algorithm. In contrast, the LRW discretisation requires only sampling a binary or ternary random variable, which is a much simpler operation, which ties the discretisation implementation much closer to digital hardware.
IV.3 Robustness to (Quantisation) Error
The form of the LRW discretisation suggests superior robustness to general error in the computation of the drift and diffusion functions compared to traditional methods like Euler-Maruyama. This robustness stems from the fundamental difference in how errors propagate through the computation.
In Euler-Maruyama, errors in the drift and diffusion computations directly accumulate in the continuous-valued update (2). Any quantisation or numerical error in computing or compound over multiple iterations.
In contrast, the LRW discretisation contracts the entire drift and diffusion computation into the sampling of a binary or ternary random variable. This contraction provides a natural form of error suppression: errors in the underlying drift and diffusion functions only affect the probabilities in (4), and these probabilities are then used to sample discrete outcomes. The discrete nature of the output means that small perturbations to the probabilities often result in the same discrete outcome, providing inherent robustness.
Quantisation error is a particularly notable example of this robustness: when the drift and diffusion functions are computed with limited precision (as is dictated by digital hardware), a portion of the resulting quantisation errors in and are absorbed into the probability computation rather than being directly propagated as continuous-valued errors. This makes the LRW discretisation particularly well-suited for implementations on quantised hardware or in scenarios where computational precision is limited, as the discrete output structure naturally mitigates the impact of quantisation artifacts that would otherwise accumulate in continuous-valued methods which assume infinite precision.
Quantisation of the SDE and in particular the drift function is a very prominent consideration in the field of large scale diffusion models where the drift function comprises a very large neural network. Thus quantisation allows significant savings in time of execution, energy consumption and financial cost [40] (this is independent of potential speedups from stochastic computing where these advantages could be amplified significantly further).
IV.4 Non-Lipschitz Drifts
We now turn to the concept of stability of SDE discretisation for a given non-zero stepsize . In this case, the discretised SDE does not have the same solution as the continuous-time SDE, and the degree of separation depends on the discretisation method and the behaviour of the drift and diffusion functions. Of particular interest are drift functions that are not globally Lipschitz continuous, which informally are drift functions that cannot be bounded linearly in all areas of the state space. In reality, global Lipschitz continuity is a very strong constraint [31, 45] that is not typically satisfied by practically relevant SDEs such as those arising in chemistry [20], finance [24], modern machine learning diffusion models [56] as well as Bayesian inference as we will see in Section V.2.
Formally, a drift function is said to be globally Lipschitz continuous if there exists a constant such that
[TABLE]
for all and all .
Traditional proofs of the weak convergence of Euler-Maruyama require the drift function to be globally Lipschitz. Whilst this can be relaxed, see [23], in practice Euler-Maruyama is known to perform poorly for non-globally Lipschitz drifts. The fundamental issue is that the continuous-valued update (2) can lead to explosive behaviour when the drift function grows faster than linearly.
In contrast, the LRW discretisation exhibits superior stability for drifts that are not globally Lipschitz continuous. The key insight is that the discrete nature of the LRW update (3) naturally constrains the magnitude of each step to or [math], regardless of how large the drift function becomes. This bounded step size prevents the explosive behaviour that can occur in Euler-Maruyama.
More formally, we can consider the intuition of the skew-symmetric discretisation from [32] by considering the moments of the increment. For both Euler-Maruyama and LRW we have first moment
[TABLE]
but for the (diagonal) second moment we have
[TABLE]
Thus for fixed stepsize the second moment of the Euler-Maruyama update is unbounded for drifts lacking global Lipschitz continuity, whereas the second moment of the LRW update is independent of the drift function.
V Experiments
In this section, we will present some experiments demonstrating the advantages of the lattice random walk discretisation. Specifically, we demonstrate robustness to quantisation error, stability for non-globally Lipschitz drifts and scalability to large scale state-of-the-art diffusion models.
V.1 Ornstein-Uhlenbeck Process
We compare the effect of floating point quantisation for Euler-Maruyama and the LRW discretisation on traditional hardware with floating point arithmetic. Although as described in Appendix C the LRW state can be reparameterised to integers, however the underlying arithmetic in the drift and diffusion functions still typically requires floating point arithmetic, whose quantisation error we will examine in this experiment.
We consider the multivariate Ornstein-Uhlenbeck process [2]
[TABLE]
where is a symmetric positive-definite matrix, and is a scalar temperature. The OU process has stationary distribution .
We set , and sample where with and . For each seed and discretisation scheme we run steps (noting that it is the number of steps that is fixed rather than the time duration ) starting from discarding the first third as a burn-in. We measure accuracy with where is the Gaussian distribution constructed from the empirical mean and covariance of the samples and is the true stationary distribution.
We start by examining the rule of thumb from Section II.2 in Figure 4. Here we repeat the experiment over a range of stepsizes and . We can see that setting at or slightly larger results in the best performance justifying the rule of thumb (noting that is constant in this experiment). Whilst when the constraints are not met (i.e. above the rule of thumb line), performance degrades rapidly.
We then adopt this rule of thumb and set resulting in a binary update for the quantisation experiment in Figure 4 where we vary the floating point precision of the underlying drift and diffusion arithmetic using 8, 16 and 32 bits. We observe that LRW is as accurate and generally more so than Euler-Maruyama for all stepsizes and precision. Particularly for large stepsizes we observe that the LRW discretisation performs significantly better than the Euler-Maruyama discretisation. For small stepsizes and high precision (32 bits) the Euler-Maruyama discretisation can match the performance of the LRW discretisation but not for low precision (16 bits). For sufficiently small stepsize the error from lack of exploration (small evolution time) dominates. Overall, we see negligible degradation in performance for the LRW discretisation from 32 to 16 bit precision.
It’s also worth commenting that the LRW discretisation (with rule of thumb ) is significantly more robust to the choice of the temporal stepsize , as demonstrated by the flatter bottom of the curves in Figure 4.
V.2 Poisson Random Effects Model (Non-globally Lipshitz)
We now investigate sampling an SDE with a non-globally Lipschitz drift function. Following [32], we consider overdamped Langevin sampling from a Bayesian Poisson random effects model. In particular, we consider the SDE
[TABLE]
where , , , for and .
Following [32] we set but only consider as an interest parameter, we set and . We set underlying true parameters and sample the rest . We initiate the discretisations at the true parameters (therefore do not apply a burn-in) and run for 50k steps. The posterior distribution in is well concentrated around the true parameter and thus we report the mean squared error from the ergodic mean of the generated samples and the true parameter. For each stepsize, we repeat each trajectory generation 100 times with different seeds. As above we set using the rule of thumb (9) justified in Figure 4.
We can see from Figure 5 that the LRW discretisation performs well across all stepsizes whereas the error in the Euler-Maruyama discretisation explodes for larger stepsizes, as also observed in prior works [30, 32].
V.3 Diffusion Model
In this experiment we deploy lattice random walk discretisation to a large-scale state-of-the-art image generation model. We use the popular Stable Diffusion 3.5 model [17] which has over 8 billion parameters. Stable Diffusion 3.5 is technically a flow-matching model [33] and differs in training than that of a continuous-time diffusion model [56]. This class of models are usually viewed as a deterministic ordinary differential equation at inference time. However, as described in Appendix D (which expands on the exposition from [33]), flow-matching models can be recast as SDEs of the form
[TABLE]
where is the learnt score function, is a noise schedule (also determined by the model training), is its time derivative and is a tuning parameter that controls the level of noise added during the trajectory. Remarkably the marginal distributions are the same for all choices of [33].
In Figure 6 we compare the Euler-Maruyama and LRW discretisations for generating prompted images from the pretrained Stable Diffusion 3.5 model [17], which provide and in (14). In all cases, we set the level of noise following [33, 56] with . In this case, unlike the previous experiments, the diffusion term is time-varying, but we can still apply the rule of thumb from Section II.2 with a time-varying .
A quantitative numerical analysis comparing Fréchet inception distance between EM and LRW over varying steps is found in Appendix E.
We can see from Figure 6 that for the 50 timesteps, the LRW discretisation is able to generate images that are equivalent in quality to that of Euler-Maruyama. For 25 timesteps, the LRW images are still of high quality, but exhibit more graininess or similary to the original noise source . We posit that this is due to the fact that the larger second moment of the Euler-Maruyama discretisation (12) is able to propagate the images further away from the initial noise source in absolute terms, this pairs with the prescense of the stocasticity acting as an error correcting mechanism [55].
Overall, Figure 6 confirms that the LRW discretisation is scalable to large-scale, modern stochastic differential equation models for machine learning pipelines.
VI Discussion
In this work, we introduced a novel LRW discretisation for the simulation of stochastic differential equations and proved its weak convergence to the true SDE. Unlike existing discretisations, with the LRW the computation of the drift and diffusion functions only appears in the sampling of a ternary random variable (or binary under certain parameter setting, see Section II.2). This means that the LRW not only entirely avoids the need for Gaussian random variable generation but also is significantly more robust to quantisation error and exploding drift functions, as confirmed numerically in Sections V.1 and V.2. An additional consequence of the bottleneck computation being suppressed into the sampling of a ternary random variable is that the LRW discretisation unlocks the potential to use stochastic computing architectures that have historically struggled with the problem of aggregating output bits to regain high precision output, a problem that is sidestepped entirely by the LRW discretisation.
We conclude with discussing the limitations of the introduced LRW discretisation as well as prominent directions for future work.
VI.1 Limitations
As discussed in Section II.3 the presented LRW discretisation is limited to diagonal diffusion matrices. This is a strong assumption, but in practice is rarely a problem as the vast majority of SDEs in application are formulated with diagonal or scalar diffusion coefficients. Yet, the extension to dense diffusion is a natural direction for future work which we perceive as achievable perhaps inspired by the Lamperti transform (Appendix B), however one would have to be careful to avoid matrix inversions at each iteration which can be costly.
Section II.1 presents a proof of the weak convergence of the LRW discretisation and at weak order 1. Under some restrictions of the SDE (1) weak order 2 convergence can be achieved through relatively straightforward modifications to the Euler-Maruyama discretisation, such as the BAOAB-limit scheme [37] for the case of constant, scalar diffusion coefficient. It may be possible to adopt similar ideas to construct LRW discretisations that are weak order 2 under similar SDE settings, although the LRW requirement that all steps need to be stochastic will need to be carefully considered.
We have not discussed the analysis of strong convergence, which measures the accuracy of individual sample paths rather than expectations. Strong convergence analysis for LRW is significantly more challenging than for traditional methods like Euler-Maruyama due to the discrete nature of the increments and the coupling between the spatial stepsize and temporal stepsize . However, weak convergence is generally more relevant for most applications, particularly in machine learning and statistical inference where we are primarily interested in statistical properties (expectations, variances, etc.) rather than the accuracy of individual trajectories. This is especially true for applications like diffusion models and Bayesian sampling where the goal is to generate samples from a target distribution rather than to accurately track specific paths.
In the numerical study in Section V, we have focussed the experiments on comparison with Euler-Maruyama, despite there being other discretisations to choose from (see Table 1). We have done so because the LRW discretisation is general-purpose in the sense that the only restriction on (1) is that the diffusion matrix is diagonal, in particular it supports time and/or space-varying diffusion coefficients. This in contrast to other discretisations which either assume a time-homogeneity (e.g. BAOAB-limit scheme [37]), or require multiple drift evaluations per step (e.g. Heun’s method [34]). Along with the fact that Euler-Maruyama remains the most popular discretisation for SDEs in modern machine learning diffusion models [33, 56]. Therefore we have focussed on Euler-Maruyama as a general-purpose baseline for comparison, in follow-up work as we focus on more specific applications, we will consider other discretisations that are more appropriate for the application at hand.
We have also not included any experiments on stochastic computing architectures, although we believe that the discrete nature of the LRW discretisation makes it ideally suited for stochastic computing architectures. However, due to the additional complexity of accurately simulating stochastic computing primitives we have decided to limit the scope of this work to the simulation of SDEs and justification of (an exact implementation of) the LRW discretisation in its own right. Future work will detail explicitly potential implementations and simulations of stochastic computing architectures.
VI.2 Future Work
Naturally, the above limitations open up several directions for future work regarding dense diffusion matrices, higher-order schemes and strong convergence as discussed above.
Additionally, the development of specialized stochastic computing hardware implementations represents perhaps the most transformative direction and one we are actively pursuing [46, 4, 10, 1, 13, 12]. The discrete nature of LRW outputs makes it ideally suited for stochastic computing architectures. Indeed, developing dedicated hardware that can efficiently generate and manipulate the binary/ternary random variables could unlock many orders of magnitude speedup for SDE simulation in applications such as large-scale diffusion models, as unlocked by the introduction of the lattice random walk discretisation.
Acknowledgments
We thank Lars Holdijk, Rob Brekelmans, Gavin Crooks, Jan Ole Ernst, and Max Welling for valuable feedback during the writing of the manuscript. We thank the Advanced Research and Invention Agency’s (ARIA) Scaling Compute programme for funding this work.
Author contributions
S.D. invented the binary form of the algorithm. M.A. derived the binary discretization for unit step size. M.A. assisted with the derivation to the more general ternary form. S.D. and D.M. ran the experiments. M.A., Z.B. and P.C. contributed to fundamental early work on a version for linear systems and its connection to stochastic computing. All authors reviewed the manuscript.
Competing Interests
All authors declare no financial or non-financial competing interests.
Appendix A Weak Convergence
We now prove the lattice random walk discretisation for the SDE (1) converges with weak order 1. We restate the theorem from Section II.1 for clarity.
Theorem 1** (Weak convergence of the LRW discretisation).**
Consider the SDE (1) with drift function and diagonal diffusion matrix that are sufficiently smooth. Let be a test function with bounded derivatives. Then the LRW discretisation (3) with spatial stepsize has weak order 1, i.e.,
[TABLE]
where is the discretized solution at time and is the true SDE solution.
Proof.
We prove weak convergence with order 1 by using the infinitesimal generator of the true SDE to bound the local and then global error between expectations with respect to the true SDE and samples from the LRW discretisation.
Generator.
Let be a sufficiently smooth test function and denote by the true solution of (1) with diagonal . Then the generator of the true solution has the form [51, page 50]
[TABLE]
(note the above is the simplified generator in the case of diagonal ).
Increment moments.
Writing with diagonal , then from the definitions of in (4) we get
[TABLE]
and also third moments
[TABLE]
Where in all cases the moments are conditional on .
One‐step expansion.
Expanding in gives
[TABLE]
We conclude
[TABLE]
where we used .
Local weak error.
From the one‐step expansion we have
[TABLE]
so the local weak error is
[TABLE]
constraint.
We now use so that .
Global weak error.
Hence , and over steps the global weak error is
[TABLE]
i.e. the scheme is weak‐order 1 for .
∎
Appendix B Lamperti Transform
Given an SDE with time-varying but not state-varying diffusion coefficient
[TABLE]
define the rescaling
[TABLE]
Then by Itô’s formula one finds
[TABLE]
where is the matrix inverse of and is a scalar.
And we can transform back with
[TABLE]
For more details, including the case of state varying diffusion coefficient see [48].
Appendix C Reparameterisation to Integers
Formally the discretisation in (3) is defined on a lattice of points but not integers since the spatial stepsize is real valued. However we can consider the reparameterization
[TABLE]
This gives an integer update rule in (assuming is an integer)
[TABLE]
with probabilities
[TABLE]
Appendix D From Flows to SDEs
In this section we summarize the framework introduced in [33] unifying diffusion and flow models at inference time as a family of SDEs with flexible amount of noise (including zero noise i.e. an ODE as a special case).
Flow matching models [33] typically learn to reverse a noising process, where the noising process is defined as the mollified Gaussian for some noise schedule
[TABLE]
where we have used rather than to indicate this is the forward or noising process, (before later using to indicate the more complicated reverse or denoising process to match (1)).
The output of the flow matching training is a velocity that can be used within an ordinary differential equation (ODE) solver that reverses the process for
[TABLE]
It can be unnatural to think of an ODE running in reverse time with , instead we can rewrite the above ODE as
[TABLE]
where now we are advancing with .
As described in [33], the trained flow matching model actually recovers a scaled version of the score function . That is, the ordinary differential equation, moving forward in time , can be written as
[TABLE]
where . Since is typically increasing with then is decreasing with and is negative.
We can convert the velocity to the score with knowledge of the noise schedule
[TABLE]
Further, this ODE can be generalised to an SDE by adding any amount of Langevin noise that is invariant for the instantaneous distribution [44]
[TABLE]
matching that of (14) where controls the level of added noise but remarkably does not affect the marginal distributions [33, 16]. A common choice is for some value of (noting that implies ).
Appendix E Fréchet Inception Distance Experiment
In Figure 7, we extend the experiment in Section V.3 to give a numerical quantification of the quality of images generated by LRW - we use the commonly used Fréchet Inception distance to numerically quantify images generated with gold standard images (in this case running SD3.5 with a small stepsize over 100 steps). In each case we generate 100 images using the same prompt as in Section V.3. We see that Euler-Maruyama out-performs at low number of steps (and therefore large stepsize), with LRW catching up for more steps and converging at a similar rate. This behaviour is not unsurprising given the harsh quantisation of the LRW increment and indeed it is somewhat remarkable that LRW performance is close - noting that after 50 steps EM and LRW are not visually distinguishable as seen in Figure 6a.
Appendix F Simlulating Ornstein-Uhlenbeck Processes with Stochastic Computing
As a concrete example of the LRW discretisation with stochastic computing, we can consider simulating an Ornstein-Uhlenbeck process
[TABLE]
An iteration of binary LRW amounts to sampling a bipolar vector with moments
[TABLE]
then we get LRW update .
The bipolar increment expectation can be written as a single matrix vector product
[TABLE]
A common method for (scaled) addition in stochastic computing is the so-called multiplexer approach (e.g. Section 3 in [21]). Generalizing to a dot product we can write the protocol for input stochastic bipolar vector with expected value and real weights as
[TABLE]
where . This gives expected value as the (scaled) dot product
[TABLE]
which can be repeated in parallel for the rows of the matrix-vector product in (18).
To apply this protocol, we encode as a stochastic bipolar vector: assuming , each component is represented by a random bit with , giving . Sampling from the categorical distribution naïvely requires time, but using the alias method [57] with preprocessing, each sample can be drawn in time. This gives preprocessing (for alias tables, one per row of ) and work per LRW iteration, or parallel time with threads since the categorical samples are independent.
For the protocol to be valid, we require for each coordinate . Combined with a bound (which holds if the OU process remains bounded), we obtain a maximal stepsize
[TABLE]
where is the maximum absolute row sum of .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Aifer, Z. Belateche, S. Bramhavar, K. Y. Camsari, P. J. Coles, G. Crooks, D. J. Durian, A. J. Liu, A. Marchenkova, A. J. Martinez, et al. (2025) Solving the compute crisis with physics-based asics . ar Xiv preprint ar Xiv:2507.10463 . Cited by: §VI.2 .
- 2[2] M. Aifer, K. Donatella, M. H. Gordon, S. Duffield, T. Ahle, D. Simpson, G. Crooks, and P. J. Coles (2024) Thermodynamic linear algebra . npj Unconv. Comput. 1 ( 1 ), pp. 13 . External Links: Document Cited by: §V.1 . · doi ↗
- 3[3] A. Alaghi and J. P. Hayes (2013) Survey of stochastic computing . ACM Transactions on Embedded computing systems (TECS) 12 ( 2s ), pp. 1–19 . Cited by: §IV.1 , §IV.1 .
- 4[4] Z. Belateche (2025-04) Scaling thermodynamic compute . Note: Normal Computing Blog Blog post; accessed 2025-08-21 External Links: Link Cited by: §VI.2 .
- 5[5] G. E. Blelloch (1996) Programming parallel algorithms . Communications of the ACM 39 ( 3 ), pp. 85–97 . Cited by: §IV.1 .
- 6[6] N. Bou-Rabee and E. Vanden-Eijnden (2018) Continuous-time random walks for the numerical solution of stochastic differential equations . Vol. 256 , American Mathematical Society . Cited by: §III .
- 7[7] K. Burrage, P. M. Burrage, and T. Tian (2004) Numerical methods for stochastic differential equations . Acta Numerica 13 , pp. 197–246 . Cited by: §I .
- 8[8] K. Burrage and P. M. Burrage (1996) High strong order explicit runge–kutta methods for stochastic ordinary differential equations . Applied Numerical Mathematics 22 ( 1–3 ), pp. 81–101 . Cited by: Table 1 .
