A Monte Carlo method for computing the action of a matrix exponential on a vector
Juan A.Acebron

TL;DR
This paper introduces a Monte Carlo approach for efficiently computing the action of a matrix exponential on a vector, leveraging random paths and Markov chains, and demonstrates its superior performance on large-scale problems.
Contribution
It extends existing Monte Carlo methods by incorporating probabilistic path generation for matrix exponential actions, enabling efficient large-scale computations.
Findings
The algorithm effectively computes single entries and full vectors.
It outperforms Krylov-based methods on large-scale problems.
Benchmarks show significant efficiency gains.
Abstract
A Monte Carlo method for computing the action of a matrix exponential for a certain class of matrices on a vector is proposed. The method is based on generating random paths, which evolve through the indices of the matrix, governed by a given continuous-time Markov chain. The vector solution is computed probabilistically by averaging over a suitable multiplicative functional. This representation extends the existing linear algebra Monte Carlo-based methods, and was used in practice to develop an efficient algorithm capable of computing both, a single entry or the full vector solution. Finally, several relevant benchmarks were executed to assess the performance of the algorithm. A comparison with the results obtained with a Krylov-based method shows the remarkable performance of the algorithm for solving large-scale problems.
| CPU Time SM (s) | CPU Time SC (s) | |
|---|---|---|
| Network size | CPU Time MC (s) | CPU Time Matlab (s) |
|---|---|---|
| Sample size M | isim () | isim () |
|---|---|---|
| Sample size M | isim () | isim () |
|---|---|---|
| isim () | isim () |
|---|---|
| isim () | isim () | |
|---|---|---|
| isim () | isim () | |
|---|---|---|
| isim () | isim () |
|---|---|
| Network size | CPU Time MC (s) | CPU Time Matlab (s) |
|---|---|---|
| Network type | Size | CPU Time MC (s) | CPU Time Matlab (s) |
|---|---|---|---|
| Wikipedia | |||
| USA roads | |||
| Europe OSM |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Theoretical and Computational Physics · Markov Chains and Monte Carlo Methods
A Monte Carlo method for computing the action of a matrix exponential on a vector
Juan A. Acebrón
Dept. Information Science and Technology, ISCTE-University Institute of Lisbon, Portugal
INESC-ID,Instituto Superior Técnico, Universidade de Lisboa, Portugal
Abstract
A Monte Carlo method for computing the action of a matrix exponential for a certain class of matrices on a vector is proposed. The method is based on generating random paths, which evolve through the indices of the matrix, governed by a given continuous-time Markov chain. The vector solution is computed probabilistically by averaging over a suitable multiplicative functional. This representation extends the existing linear algebra Monte Carlo-based methods, and was used in practice to develop an efficient algorithm capable of computing both, a single entry or the full vector solution. Finally, several relevant benchmarks were executed to assess the performance of the algorithm. A comparison with the results obtained with a Krylov-based method shows the remarkable performance of the algorithm for solving large-scale problems.
keywords:
Monte Carlo methods, matrix functions, network analysis, communicability
PACS:
65C05 , 65C20 , 65N55 , 65M75 , 65Y20
1 Introduction
Computing the action of a matrix function on a vector is experiencing these days a reborn interest. This is not because of the absence of relevant applications in science and engineering in the past, but rather because the improvement in the numerical methods, along with the advent of highly massive parallel computers, now allows one to be able to attack more realistic problems on a large scale, far beyond the merely academic problems capable of being simulated in the past. These applications include circuit simulations [36]; power grid simulation [37, 33]; nuclear reaction simulations [31]; analysis of transient solutions in Markov chains [34]; simulations of quantum systems [32]; numerical solution of partial differential equations (PDEs) [24, 27]; and analysis of complex networks [5].
Circuit and power grid simulation play an important role during the design of integrated circuits, being in general a heavy computationally task of the whole design process. In the analysis of a reactor fuel, a computational heavy task of the analysis consists in solving the burnup equations describing the rates of the concentration of the different nuclides of the nuclear fuel. Computing the action of a matrix exponential over the initial state appears as an important numerical alternative among the different available techniques, such as integrating the Chapman-Kolmogorov system of differential equations for obtaining the transient solution of homogeneous irreducible Markov chains. In the field of partial differential equations, numerically solving a boundary-value PDE problem by means of the method of lines requires in practice to compute the action of a matrix exponential over the initial condition. Finally, in network analysis, determining some important metrics of the network, such as for instance the total communicability which characterizes the importance of the nodes inside the network, entails computing the exponential of the adjacency matrix of the network.
Over the last few years many numerical methods have been proposed for computing the action of a matrix function over a vector. There are already excellent reviews in the literature describing the different numerical methods proposed so far (see [10, 20, 22, 23], e.g.); therefore it is not intended to go into any details here. Instead we will briefly describe some of them for those readers not familiar with the topic. Essentially we can classify these methods as follows: Krylov-based subspace methods, contour integration methods, ordinary differential equation methods, and polynomial or rational methods. One of the most studied methods in theory, and used in practice, are those based on the Krylov subspace method. The idea behind the method consists in projecting the given (typically large) matrix onto a Krylov subspace. For the specific case of an exponential function, through a basis of the subspace constructed using the Arnoldi process, the exponential of the projected matrix is computed by using a standard technique based typically on the squaring and scaling method [22].
The idea of using probabilistic methods based on Monte Carlo simulations for computing functions of matrices goes back to the pioneering work of von Neumann and Ulam during the 1940’s [18]. Although initially the method was proposed merely for computing the inverse of a matrix, it was later generalized for solving linear algebra problems in a series of seminal papers, see [13] e.g., and [12] for further references. Briefly the underlying idea consists in generating a discrete Markov chain which evolves by random paths through the different indices of the matrix. Mathematically, the method can be seen in a way as a Monte Carlo sampling of the Neumann series of the matrix. The convergence of the method was rigorously established in [26], and improved further more recently (see for instance [15], and [7] just to cite a few references). More recently, and for the specific case of computing the action of a Hermitian matrix exponential over a vector, which is of interest in Quantum Physics, it has been proposed in [32] an efficient algorithm based on a novel randomized linear algebra technique known in the literature as the Nyström method.
These probabilistic methods offer important computational advantages. Furthermore, the algorithms are much simpler to code than their deterministic counterparts, which impact positively in promoting an easy further optimization of the codes; it turns out that they are specially suited for parallel computing. This is because the solution is often computed through an expectation value of a given finite sample, the simulations are independent from each other. This is of paramount importance because it allows for the development of parallel codes with extremely low communication overhead among processors, and has a positive effect on properties such as scalability and fault-tolerance. For the parallel implementation of the Monte Carlo method for solving linear algebra problems see [14] e.g.
Another important advantage of the probabilistic methods consists in the feasibility of computing the solution of the problem at specific chosen points, without the need for solving globally the entire problem. This remarkable feature offers important advantages in dealing with some specific applications found in science and engineering, and it has been explored for efficiently solving continuous problems such as boundary-value problems for PDEs in [1, 2, 3], and references therein. However, this is not an exclusive advantage of the probabilistic methods. In fact, for the specific problem of matrix functions, it has been proposed in the literature several methods [19] capable also of estimating individual entries of the matrix function. The main idea consists in applying several quadrature rules along with a single iteration step of the Lanczos algorithm to obtain a priori lower and upper bounds. Moreover the bounds can be further improved a posteriori using several iterations of the Lanczos algorithm in the quadrature formula. This idea has been applied succesfully to the problem of estimating different metrics of complex networks in [8]. The computational cost has been estimated to grow linearly with the matrix size in the best case, since for each iteration of the Lanczos algorithm is required to compute a matrix-vector multiplication.
The purpose of this paper is to extend the existing aforementioned Monte Carlo methods for dealing with other functions of matrices, such as the matrix exponential, and more specifically for the problem of computing the action of a matrix exponential on a vector for a certain class of matrices. This is done by resorting to a probabilistic representation of the vector solution based on generating random paths corresponding to samples of a suitable continuous-time Markov chain. The convergence of the method was conveniently analyzed, as well as the computational cost estimated. In addition, several relevant numerical examples, extracted from network analysis are given, focusing on both, the accuracy and performance of the method.
The paper is organized as follows. The probabilistic representation of the vector solution is presented in Sec. 2. In Sec. 3, it is explained how the probabilistic representation can be implemented in practice. Secs. 4 and 5 are devoted to the analysis of the computational cost of the algorithm, and the associated numerical errors of the method, respectively. Finally in Sec. 6 several benchmarks are executed to assess the performance of the method in comparison with the performance obtained by the classical Krylov-based method. To conclude we summarize the main results and discuss potential directions for future research.
2 The numerical method
Let a given sparse -by- symmetric matrix, an n-dimensional vector, and an n-dimensional vector solution of evaluating . We assume that can be decomposed as , where is the Laplacian matrix symmetric and irreducible [28], and a diagonal matrix with entries . Since in general both matrices do not commute, it does not hold that . However, an approximation of the matrix exponential can be easily obtained by resorting to the exponential Lie splitting, and yields
[TABLE]
Here , and in the following for convenience it will termed as the time step. It is known that the local error of the Lie splitting after one time step is given by
[TABLE]
being in general of order of for the global error. A higher order approximation does exist, and in view of being the matrix diagonal, it can be computed without any additional computational cost. In fact, the well known Strang splitting yields,
[TABLE]
The local error after one time step of this approximation is known [25] to be
[TABLE]
and globally of order of . The next lemma will be used to derive a probabilistic representation for the vector solution , in the following denoted as for simplicity. To this purpose we need first to prove the following useful fact for the partial solution .
Lemma 1
Assume is a discrete random variable that takes values on with probability given by the transition probabilities of a continuous-time Markov chain generated by the infinitesimal generator and evaluated at time . Then, any entry of the vector
[TABLE]
can be represented as , with , and its expected value.
Proof. Since is a diagonal matrix, can be computed as follows
[TABLE]
By the definition of the unnormalized Laplacian matrix of a graph , is a matrix with diagonal elements equal to the degree of each vertex , and the off-diagonal , if is an edge, or [math] otherwise. Therefore, it follows that , and , and hence the matrix can be assumed to be a generator of a suitable continuous-time Markov chain on the state space . Then,
[TABLE]
where are the corresponding transition probabilities of the Markov chain evaluated at time , solution of the Kolmogorov’s backward equations,
[TABLE]
for the matrix transition probability .
Note that such a probabilistic representation allows in practice to compute a single entry of the vector solution. This is done by generating suitable random paths, corresponding to a continuous-time Markov chain, which evolve backward in time from the state at to a final state on S for . Finally, the chosen entry is computed by averaging the functional over the sample. Such a functional depends on the initial vector and the diagonal matrix . Moreover, applying this Lemma to Eq. (3) allow us to derive the following general theorem.
Theorem 2
Let , , a sequence of discrete random variables with outcomes on . The probabilities , , and for , correspond to the transition probabilities of a continuous-time Markov chain generated by the same infinitesimal generator and evaluated at time for each . Then, we have that any entry of the vector solution in Eq. (3) can be represented probabilistically as
[TABLE]
where , , and .
Proof. In view of being a diagonal matrix, from Eq. (3) the entry of the vector is given by
[TABLE]
As proved in the Lemma above, the matrix can be assumed to be the generator of a continuous-time Markov chain with transition probability matrix . Therefore, the equation above can be rewritten as
[TABLE]
Similarly to Lemma 1, a neat picture of this general probabilistic representation can be described as follows: A random path starting at the chosen entry is generated according to the continuous-time Markov chain governed by the generator , and evolves in time by jumping randomly from to any state on S. Along this process, given functions are evaluated, and the solution is obtained through the expected value of a suitable multiplicative functional.
The Lemma 1 and Theorem 2 can be conveniently modified to represent probabilistically the complete vector solution . In fact, it is worth observing that the transpose of the generator , , corresponds to the generator of the continuous-time Markov chain generated rather forward in time, being in this case the matrix transition probability solution of Kolmogorov’s forward equations,
[TABLE]
This can be used to generate instead a random path that starts at a given state according to a specific initial distribution, and evolves forward in time governed by a continuous-time Markov chain generated by a suitable generator, which in practice corresponds to the transpose of the generator of the backward equation. However, note that , since the matrix is symmetric, and therefore it holds that the generator of the continuous Markov chain forward in time coincides with the generator backward in time. This is mathematically formalized through the following Lemma:
Lemma 3
Let and be discrete random variables on the state space . The random variable is governed by the probability function provided , while the random variable by the probability function being the transition probabilities of a continuous-time Markov chain generated by the infinitesimal generator and evaluated at time . Then, the vector , can be represented probabilistically as
[TABLE]
where , , and its expected value.
Proof. Since the proof is similar of the previous Lemma’s proof, for completeness we sketch here only the main differences. From Eq. (6) in Lemma 1, can be rewritten as follows
[TABLE]
Whenever , can be defined as a suitable probability function for the discrete random variable on S. Similarly than in the previous Lemma, is the transition probability matrix for the continuous-time Markov chain generated by , therefore it holds
[TABLE]
and hence can be represented as .
As before, we used the Lemma 1 to formulate a general theorem, which allows in practice to represent probabilistically the vector solution .
Theorem 4
*Let , , and , be discrete random variables with outcomes on
. The probabilities , , and for correspond to the transition probabilities of a continuous-time Markov chain generated by the same infinitesimal generator for each and , while for the random variable the probability is given by provided . Then, we have that the vector solution in Eq. (3) can be represented as*
[TABLE]
where , , and , and .
3 The algorithm
To implement the theorems above in order to obtain numerically a single entry or rather the full vector solution, we need to choose a finite sample of given size , replacing in such a way the expected value by the arithmetic mean. Accordingly this entails a statistical error that it will be discussed in the next section. Here we present the algorithm implemented so far to compute the solution, but before doing that we describe the numerical method used to generate in practice the continuous-time Markov chain. Let be the transition probability matrix, then the Kolmogorov backward equation in Eq. (8) can be equivalently represented as the following system of integral equations
[TABLE]
where . Let be a sequence of independent exponential random times picked up from the exponential probability density . The integral equations above, along with the sequences of random times, can be used to simulate a path according to the following recursive algorithm: Generate a first random time that obeys the exponential density function. Then, depending on whether or not, two different routes are taken. If , the algorithm is stopped, and no jump from the state to a different state is taken. If, on the contrary, , then the state jumps to a different state according to the probability function , and a new second random number exponentially distributed is generated. If the algorithm proceeds repeating the same elementary rules, otherwise it is stopped. To illustrate the procedure above, in Fig. 1 we show two random paths of a continuous-time Markov chain corresponding to an adjacency matrix of a small world network of size .
Algorithm 1 Algorithm to compute a single entry of the vector solution
for do
,
for do
generate()
while do
generate(S), generate(j)
end while
end for
end for
Algorithm 1 describes a pseudocode corresponding to the implementation of Theorem 2, which allows one to compute single entries of the vector solution, while Algorithm 2 describes the pseudocode for computing probabilistically the complete vector solution , mathematically formalized in Theorem 4.
Algorithm 2 Algorithm to compute the complete vector solution
for do
generate(i)
for do
generate()
while do
generate(S), generate(j)
end while
end for
end for
4 Computational complexity of the Monte Carlo algorithm
To estimate properly the computational cost of the algorithms above, the tasks inside the two nested loops were analyzed separately from those that were composed inside the do-while loop, and those outside it. Hence,
[TABLE]
Note that the computational cost of the inside task, which accounts for the time spent in generating the sequences of random times for the evolving paths, depends on the given matrix, while the cost of the outside tasks are totally independent. In fact, the computational time requires to generate a random paths is random, and in practice depends on the specific entries of the row of the matrix randomly visited by the paths, including the connectivity and degree of the adjacency matrix of the associated graph. More specifically, the degree affects directly the time spent by the algorithm inside the do-while loop, since the mean time of the exponential probability density governing the random time is given by . In view of the random jumping through the rows of the matrix, in the following let us assume that the computational time spent in total for all random paths can be reasonably approximated as
[TABLE]
Here is the corresponding mean time value obtained for a suitable matrix, and its associated adjacency matrix, with an average degree given by
[TABLE]
and the corresponding proportionality constant. Such a constant takes into account the computational cost for evaluating the two functions (, ) in Algorithms 1 and 2. These functions are responsible for generating both, a random time for the evolving paths (), and a random number governed by the probability function defined in Eq. (17), which determines the row of the matrix where to jump (). Note that the cost for generating the exponential random time using the function is fully independent of the matrix size, which is not the case when generating the random jumping using the function . In fact, for a matrix with arbitrary different matrix coefficients the probability function described by is in general nonuniform, and therefore the cost for generating a random increases at most linearly with the matrix size. However, to improve the performance of this function more efficient algorithms can be implemented, such as the row-searching method based in practice in a binary search tree method as proposed in [32]. Nevertheless, interestingly, for the specific case of the adjacency matrix of undirected networks it turns out that such a probability function becomes uniform, since all nodes are equally probable to jump, and therefore the computational cost for generating the random becomes fully independent of the matrix size. Indeed such a random can be trivially generated by simply multiplying a random number uniformly distributed between [math] and by the degree of the node, and finally rounding to the nearest integer.
Concerning the time spent by the remaining outside tasks, it can be readily estimated as
[TABLE]
where is the corresponding proportionality constant. Recall that , therefore it holds that
[TABLE]
Note that for a sufficiently small , the predominant term is the second one, which appears to be almost independent of the matrix, while for a large the opposite behavior occurs. This is in good agreement with the results shown in Table 1, corresponding to the CPU time spent by the Monte Carlo algorithm when computing the total communicability of two different networks characterized by different average degrees.
5 Numerical errors
In computing a single entry of the vector solution or the complete vector, we should consider, in practice, two sources of numerical error. In fact, we have to face the error due to the splitting in Eq. (2) and (4), and the error coming from necessarily replacing the expected value in (16) by a finite sum over a given finite sample of size . In the following we focus exclusively on the numerical scheme for computing the full vector solution, since the analysis of the error for the companion method for computing a single entry turns out to be identical. To be more precise, the global error made in computing probabilistically the vector solution can be evaluated as
[TABLE]
where corresponds to the realization of the random variable defined in Theorem 4, and
[TABLE]
As mentioned already in Sec. 2, the first error is due merely to the splitting procedure, and the error is of the order of or depending on whether the Lie or the Strang splitting is used.
The second error, , is the pure Monte Carlo statistical error, and of order of . In fact, it is well known that the arithmetic mean appearing in (23) provides the best unbiased estimator for the expected value in (16). In practice, one should simulate on the computer the random variables, based on generating random numbers. By doing so, the error made in replacing the expected value with the mean over a finite size sample is statistical in nature. More precisely, turns out to be, for a large value, approximately a random Gaussian variable with standard deviation proportional to , i.e.,
[TABLE]
where denotes the square root of the variance, and is a standard normal (i.e., ) random variable. All this clearly shows that the proposed Monte Carlo method could in principle have a poor numerical performance, and also that the error is merely statistical, so it can only be bounded by some quantity with a certain degree of confidence. However, there already exist many available statistical techniques, such as variance reduction, multilevel Monte Carlo, and quasi-random numbers, that can be used, in practice, to improve greatly the order of the global error, and consequently the overall performance of the algorithm.
To illustrate the global error of the numerical method, and its convergence, several examples were run to examine the specific problem of computing the total communicability of a complex network (see [6] e.g.) for different network sizes. The error was computed assuming the solution obtained using the built-in function expm of Matlab as if it were the theoretical solution. The underlying algorithm consists essentially of a rational approximation by means of the Padé approximation of an underlying series expansion of the matrix, along with a direct method for computing the inverse of a suitable linear algebra problem, which in practice entails an LU decomposition. Therefore, since it is based on more highly accurate methods, we can assumed it to be a highly accurate approximation of the generally unavailable theoretical solution. In Fig. 2 the absolute error is plotted as a function of the time step for a network of 100 nodes in (a), and 1000 in (b), and the two different splitting methods. The solid line corresponds in practice to the purely splitting error , while the dashed line corresponds to the error of the complete Monte Carlo method . Surprisingly, both the Lie and Strang splitting seem to show the same order of the error (which is as can be seen from the slope of the ancillary function plotted in the figure to help the reader). This can be readily explained from Eq. (2) and the definition of total communicability. Indeed, by definition of the total communicability of a network [6]
[TABLE]
where is a vector of ones, the scalar product, and since , it holds that
[TABLE]
Therefore, from (2) the order of the local error for the Lie splitting turns out to be of order , and in particular of order as the global error, as it happens for the Strang splitting. However, quantitatively the global error for the Strang splitting appears to be smaller than the error obtained with the Lie splitting. The reason can be found in that the proportionality constant multiplying for the Strang splitting is smaller. Rather, this does not occur when computing the communicability of a single node, which in practice entails computing a single entry of the vector solution. In fact, Fig. 3 shows that the error of the Strang splitting is one order of magnitude larger than for the Lie splitting, being in both cases the order of convergence as was theoretically expected.
Moreover, it is worth observing that the absolute error tends to a constant value for sufficiently small values of , being such a critical value larger when a smaller sample size is used. This is because for this range of values of the global error comes predominantly from the statistical error (which is independent of ), since the splitting error is already much smaller than the statistical error. In practice this means that from a given value of it becomes useless to reduce further the time step , and consequently increase the computational cost. From that point the error becomes mostly statistical, being therefore required rather to increase the sample size in order to continue reducing the global error. In fact, in Fig. 4 the time step was chosen to be sufficiently small, . This makes the splitting error negligible compared with the statistical error, and therefore the absolute error shown is mostly statistical, decreasing as as expected by theory.
For the specific problem of computing the total communicability of a network in Eq. (27), it can be analytically estimated how the numerical error depends on the network size. Concerning the error due to exponential splitting, from Eq. (4) and since , we obtain
[TABLE]
By using the Cauchy-Schwarz inequality, the error can be bounded as follows
[TABLE]
where corresponds to the maximum degree of the network. Therefore, for those networks characterized by having a maximum degree almost independent of the network size, such as the small-world networks, the error is almost independent of the size. In fact, this is in agreement with the results plotted in Fig. 2, where the total communicability has been computed for two different network sizes. Rather for those other networks where the maximum degree increases with the network size, the error increases with the network size requiring therefore to reduce accordingly to keep constant the error. Concerning the other source of errors, as it was mentioned above this concerns the statistical error due to the finite sample of the Monte Carlo simulations. However, this error turns out to be independent of the network size as it is shown in Fig. 4. This should not be so surprising since as it happens for Monte Carlo integration of definitive integrals, the underlying error when computing numerically the expected value of the function does not depend on the number of dimensions of the integral. In fact, this is the main advantage of Monte Carlo integration against most deterministic methods, which it is known to grow exponentially with the dimensions.
6 Some results and benchmarks
To illustrate the numerical method and performance of the underlying algorithm, in the following we show the results corresponding to several benchmarks run so far. They concern the numerical computation of the metric communicability in both synthetic and real complex networks. For comparison with other methods, and to estimate the numerical errors, the Matlab toolbox funm kryl developed in [4], and freely available in [21], has been used. Such a code implements a Krylov subspace method with deflated restarting for matrix functions. Concerning the Monte Carlo algorithm, it was implemented in Fortran 90, and for a fair comparison with the performance obtained with the Matlab code, no further optimization of the code or the Fortran compiler was performed. Moreover, Fortran is purely sequential, while Matlab by default employs multi-threading architecture for running simulations. Therefore, to make a fair comparison between the systems, Matlab’s multi-threading feature was completely disabled. The simulations were run on a computer equipped with an Intel Xeon CPU E5620 at GHz and GB of RAM.
Small-world networks. These networks were generated in Matlab using the function smallw, freely available through the toolbox CONTEST [11]. In Table 2 the computational time to compute the total communicability of the network for about the same error is shown for different network sizes. Up to a network size of nodes, the error was estimated using Matlab’s built-in function expm as if it were the theoretical solution. For increasingly larger networks however, the high computational cost of this function makes its computation a formidable task, making it necessary therefore to resort to other methods to estimate the numerical error. For such a purpose, the aforementioned Krylov-based method was used by setting a very small value of the stopping-accuracy parameter, , as well as the restart parameter to . These are also the parameters that have been modified in order to obtain a similar error with the Monte Carlo simulations.
It is remarkable to note that the computational cost of the Monte Carlo method appears to be almost independent of the size of the network, while increasing almost linearly for the Krylov-based method. For the Monte Carlo method this can explained by the fact that the maximum degree of the network is almost independent of the network size, and consequently as explained in Sec. 5, the numerical error. Therefore, to compute the solution within a given prescribed accuracy it is not required to modify the values of the sample size , and the time step for increasingly larger sizes, thereby ensuring the same computational cost of the algorithm for any network size. Such a feature allows the Monte Carlo method to achieve a computational performance which is notably higher than the classical counterpart, based on the Krylov-based method, for large scale problems. In fact, it has been pointed out in the literature [8, 32] and more specifically in [30] through a suitable Theorem, that there exists an algorithm based on the Lanczos method capable of computing the vector solution of the action of a matrix exponential over a given vector in a time that grows linearly with the matrix size. This is mostly due to the sparsity of the adjacency matrix of the network, which simplifies considerably the cost of the matrix-vector products associated to each Lanczos iteration. These theoretical findings are therefore numerically confirmed in Table 2, where it can be seen indeed that the CPU time for the Krylov-based method increases almost linearly with the network size.
As discussed in Sec. 3, the Monte Carlo method can be used as well to compute the full vector solution . In the particular case of complex networks, and when is the vector with all entries equal to 1, the vector solution represents the total subgraph communicability of each node of the network [6]. More important than the quantitative values of the entries of the vector, it is the insight obtained through the ranking of the network organized by the importance of its nodes in terms of being more or less communicable inside the network, which could be of primary importance in the field of complex networks. For this purpose, and to evaluate the similarities between the rankings obtained with the Monte Carlo method and the Krylov-based method, we use the intersection distance method [9] on both the full set of nodes of the network, and on of them. The intersection similarity distance for the top K nodes of two vectors and is defined as
[TABLE]
Here is the symmetric difference operator between the two vectors. In practice, small values of the intersection distance denote large similarities between the vectors, while the limiting value of suggests vectors that are totally disjointed. Since computing the intersection distance could be computationally costly for increasingly large networks, only relatively small sizes of the network were analyzed so far. In Table 3(b) the results corresponding to networks composed of and nodes are shown. Note that the two ranked vectors show strong similarities, being even stronger for larger network sizes.
Scale-free networks. Such networks have been generated using the function pref belonging to the aforementioned toolbox CONTEST. In contrast to the small-world network, these networks are characterized by the presence of hubs, which in practice entail a much larger maximum degree, and correspondingly larger maximum eigenvalue than for the small-world networks. For this reason, and to avoid dealing with very large values when computing the total network communicability for large networks, it is more convenient instead to analyze the so-called normalized total communicability [6], which corresponds in practice to the average total communicability of the network per node. This metric can be readily obtained dividing the total network communicability by the network size, that is . Since the value of the maximum degree increases with the network size, then in order to keep constant the numerical error it may be necessary for the Monte Carlo method to reduce the time step (or equivalently increasing the parameter ) accordingly. From Eq. (30), and assuming as an upper-bound approximation, it holds that the time step should reduce, at most, as (or equivalently the parameter increase linearly with ), to ensure a constant numerical error when computing the normalized total network communicability for arbitrary large scale-free networks. As a result, the computational cost of the algorithm, estimated in Sec. 4 as being for sufficiently small , increases therefore linearly with the network size for these type of networks.
To avoid such a computationally costly procedure, a reasonable alternative relies on computing a generalization of the communicability, that is , where is typically interpreted as an effective ”temperature” of the network (see [17], e.g.). Essentially the idea is to use the inverse of the maximum eigenvalue as the value of the parameter , which in practice will control the rapid growth of the norm of the matrix with the size of the network. To ensure fast convergence of the Monte Carlo solution, using , where is the maximum eigenvalue of A, should suffice. However, finding the maximum eigenvalue for large networks is computationally costly, and in the following a faster alternative, based on computing the maximum degree of the network, , was used instead as an upper bound value. Note that in doing that the numerical error obtained when computing the normalized total network communicability becomes independent of the network size. This can be proved readily as follows: From Eq. (30), the error to compute the normalized total network communicability is given by,
[TABLE]
where the time-step defined in Eq. (1) was replaced by . Now using as an upper-bound approximation, then it follows that the error becomes indeed totally independent of the network size, and hence the computational cost of the algorithm.
However, different values of could have a direct impact not only on the entries of the communicability vector, but also on the ranking of the nodes according to their communicability values, and therefore it becomes essential to analyze at least qualitatively such an issue. In Table 4(b) the similarity results of two ranked communicability vectors are shown for two different network sizes. All the vector solutions are computed this time using the function expm of Matlab to minimize the error, and the comparison is done by choosing as the reference vector the ranked communicability vector with .
From Table 4(b) it is worthwhile to observe the close similarity of the ranked vectors for different values of , being even closer for larger values of the network size. Recall that for the typical accuracy asked for Monte Carlo simulations, the error is already higher ( in the previous examples) than the values obtained for the intersection similarities. This fact can be exploited in practice to choose values of smaller, and consequently larger, and still be able to characterize properly the communicability of the network, being indeed indistinguishable within the prescribed accuracy for the Monte Carlo simulations.
In Table 5 the times spent to compute the normalized total communicability of scale-free networks of different sizes are shown. As for small-world networks, in all Monte Carlo simulations the error has been kept fixed to . Similar to the results obtained for the small-world network, the Monte Carlo method outperforms the Krylov-based method for large size networks, having also a computational time independent of the network size, in agreement with the theoretical considerations discussed above.
Real networks. In Table 6 the results corresponding to a few real networks of arbitrary large size are shown. These networks were downloaded from the freely available sparse matrix repository SuiteSparse Matrix Collection [35], and correspond to undirected graphs describing the largest strongly connected components of the corresponding Open Street Map road networks in Europe (Europe OSM), the USA roads (USA roads), and finally a directed graph corresponding to Wikipedia. The latter network was conveniently symmetrized following the procedure described in [5]. As in the previous examples, the performance of the Monte Carlo method is notably superior to the Krylov-based method, being that the differences are even more pronounced for large network sizes.
7 Conclusion
A new Monte Carlo method for computing the action of an exponential matrix on a vector has been proposed. The method is based on generating suitable random paths corresponding to a continuous-time Markov chain governed by the associated Laplacian matrix. It extends the existing Monte Carlo methods discussed so far in the literature for solving linear algebra problems, for dealing now with more involved functions of matrices such as the matrix exponential. An important advantage of the Monte Carlo method is that the probabilistic representation of the solution allows for efficiently computing single entries of the vector solution, along with global metrics involving the full matrix, such as the total communicability in the field of complex networks. Moreover, since the solution is obtained through averaging independent calculations it is especially well suited for parallel computation. In fact, it is known that the Monte Carlo algorithm turns out to be fully scalable and naturally fault tolerant. To test the performance of the algorithm, several benchmarks have been used, consisting of a variety of complex networks (real and synthetic) for computing the communicability of the network. The numerical errors of the method have been analyzed through the paper. The results have been compared with a classical Krylov-based method, showing a notably superior performance of the algorithm for large-scale matrices, both in terms of computational time and memory requirements.
Acknowledgments
This work was supported by Fundação para a Ciência e a Tecnologia under Grant No. UID/CEC/50021/2019.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J.A. Acebrón, M.P. Busico, P. Lanucara, and R. Spigler, Domain decomposition solution of elliptic boundary-value problems , SIAM J. Sci. Comput., 27 (2005) pp. 440-457.
- 2[2] J.A. Acebrón, and A. Rodríguez-Rozas, Highly efficient numerical algorithm based on random trees for accelerating parallel Vlasov-Poisson simulations , J. Comput. Phys., 250 (2013) pp. 224-245.
- 3[3] S. Mancini, F. Bernal, and J.A. Acebrón, An Efficient Algorithm for Accelerating Monte Carlo Approximations of the Solution to Boundary Value Problems , J. Sci. Comput., 66 (2016) pp. 577-597.
- 4[4] M. Afanasjew, M. Eiermann, O.G. Ernst, and S. Güttel, Implementation of a restarted Krylov subspace method for the evaluation of matrix functions , Linear Algebra Appl., 429 (2008) pp. 2293-2314
- 5[5] M. Benzi, E. Estrada, and C. Klymko, Ranking hubs and authorities using matrix functions , Linear Algebra Appl., 438 (2013) pp. 2447-2474.
- 6[6] M. Benzi, and C. Klymko, Total communicability as a centrality measure , J. Complex Networks, 1 (2013) pp. 124–149.
- 7[7] M. Benzi, T.M. Evans, S.P. Hamilton, M.L. Pasini, and S.R. Slattery, Analysis of Monte Carlo accelerated iterative methods for sparse linear systems , Numerical Linear Algebra with Appl., 24 (2017).
- 8[8] M. Benzi, and P. Boito, Quadrature rule-based bounds for functions of adjacency matrices , Linear Algebra Appl., 433 (2010) pp. 637-652.
