Lattice Gaussian Sampling by Markov Chain Monte Carlo: Bounded Distance Decoding and Trapdoor Sampling
Zheng Wang, Cong Ling

TL;DR
This paper advances MCMC-based lattice Gaussian sampling, deriving spectral gaps, analyzing decoding performance, and proposing new algorithms that improve convergence and enable parallel implementation for practical cryptographic applications.
Contribution
It introduces new spectral gap bounds, analyzes bounded distance decoding trade-offs, and proposes the MTMK algorithm for faster convergence in lattice Gaussian sampling.
Findings
Independent MHK converges faster due to spectral gap bounds.
Decoding performance shows a trade-off between radius and complexity.
MTMK enhances convergence rate and supports parallel implementation.
Abstract
Sampling from the lattice Gaussian distribution plays an important role in various research fields. In this paper, the Markov chain Monte Carlo (MCMC)-based sampling technique is advanced in several fronts. Firstly, the spectral gap for the independent Metropolis-Hastings-Klein (MHK) algorithm is derived, which is then extended to Peikert's algorithm and rejection sampling; we show that independent MHK exhibits faster convergence. Then, the performance of bounded distance decoding using MCMC is analyzed, revealing a flexible trade-off between the decoding radius and complexity. MCMC is further applied to trapdoor sampling, again offering a trade-off between security and complexity. Finally, the independent multiple-try Metropolis-Klein (MTMK) algorithm is proposed to enhance the convergence rate. The proposed algorithms allow parallel implementation, which is beneficial for practical…
| 1.087 | |||
|---|---|---|---|
| 1.0039 | |||
| 1.00037 | |||
| 1.0002 | |||
| 1.0001 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models · Random Matrices and Applications
Lattice Gaussian Sampling by Markov Chain Monte Carlo: Bounded Distance Decoding and Trapdoor Sampling
Zheng Wang, and Cong Ling This work was presented in part at the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, July 2016. Z. Wang is with College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China; C. Ling is with the Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, United Kingdom (e-mail: [email protected], [email protected]).
Abstract
Sampling from the lattice Gaussian distribution plays an important role in various research fields. In this paper, the Markov chain Monte Carlo (MCMC)-based sampling technique is advanced in several fronts. Firstly, the spectral gap for the independent Metropolis-Hastings-Klein (MHK) algorithm is derived, which is then extended to Peikert’s algorithm and rejection sampling; we show that independent MHK exhibits faster convergence. Then, the performance of bounded distance decoding using MCMC is analyzed, revealing a flexible trade-off between the decoding radius and complexity. MCMC is further applied to trapdoor sampling, again offering a trade-off between security and complexity. Finally, the independent multiple-try Metropolis-Klein (MTMK) algorithm is proposed to enhance the convergence rate. The proposed algorithms allow parallel implementation, which is beneficial for practical applications.
Keywords: Lattice decoding, lattice Gaussian sampling, Markov chain Monte Carlo, bounded distance decoding, large-scale MIMO detection, trapdoor sampling.
I Introduction
Nowadays, lattice Gaussian sampling has drawn a lot of attention in various research fields. In mathematics, Banaszczyk was the first to apply it to prove the transference theorems for lattices [1]. In coding, lattice Gaussian distribution was employed to obtain the full shaping gain for lattice coding [2, 3], and to achieve the capacity of the Gaussian channel [4]. It was also used to achieve information-theoretic security in the Gaussian wiretap channel [5, 6] and in the bidirectional relay channel [7], respectively. In cryptography, the lattice Gaussian distribution has become a central tool in the construction of many primitives [8, 9, 10]. Specifically, lattice Gaussian sampling lies at the core of signature schemes in the Gentry, Peikert and Vaikuntanathan (GPV) paradigm [11]. Furthermore, lattice Gaussian sampling with a suitable variance allows to solve the closest vector problem (CVP) and the shortest vector problem (SVP) [12, 13].
However, in sharp contrast to the continuous Gaussian density, it is by no means trivial even to sample from a low-dimensional discrete Gaussian distribution. For some special lattices, there are rather efficient algorithms for Gaussian sampling [4, 14]. As the default sampling algorithm for general lattices, Klein’s algorithm [15] only works when the standard deviation [11], where is a superlogarithmic function, denotes the lattice dimension and ’s are the Gram-Schmidt vectors of the lattice basis . Peikert gave an efficient lattice Gaussian sampler in [16] for parallel implementation, which however requires larger values of . On the other hand, the lattice Gaussian sampling algorithm proposed by Aggarwal et al. in [12, 13] to solve CVP and SVP has a lower bound on both space and time complexity; it actually obtains samples for small by combining original samples for . Although the algorithm in [17] provides a trade-off between (exponential) time and space complexity, its complexity is still too high to be practical.
In order to sample from a target lattice Gaussian distribution with arbitrary , Markov chain Monte Carlo (MCMC) methods were introduced in [18]. In principle, it randomly generates the next Markov state conditioned on the previous one; after the burn-in time, which is normally measured by the mixing time, the Markov chain will step into a stationary distribution, when samples from the target distribution can be obtained [19]. It has been demonstrated that Gibbs sampling, which employs univariate conditional sampling to build the chain, yields an ergodic Markov chain [20]. In [18], we proposed an independent Metropolis-Hastings (MH) algorithm incorporating Klein’s algorithm (namely, the independent MHK algorithm) to generate a proposal distribution, which is shown to be uniformly ergodic (converging exponentially fast to the stationary distribution). Meanwhile, the associated convergence rate of the Markov chain is derived, resulting in a tractable estimation of the mixing time. Differently from the algorithms of [12, 13, 17], the independent MHK sampling algorithm only requires polynomial space. In this paper, we advance the state of the art of MCMC-based lattice Gaussian sampling in several fronts.
Firstly, we refine the analysis and extend the independent MHK algorithm of [18]. We obtain the spectral gap of the transition matrix and demonstrate uniformly ergodicity. We extend the independent MH algorithm to a version where Peikert’s algorithm [16] is used to generate the proposal distribution. We then compare these MCMC algorithms with rejection sampling from statistics. By deriving their rates of convergence, we show the advantage of the independent MHK. Rejection sampling achieves the same convergence rate only if its normalizing constant is carefully chosen, which is generally rather difficult.
Secondly, we apply the independent MHK algorithm to bounded distance decoding (BDD). BDD is a variant of the CVP where the input is within a certain distance to the lattice. With a careful selection of the standard deviation during the sampling process, we improve the result of Klein from to in terms of BDD111In -BDD (), we are given a lattice basis and a query point , and we are asked to find a lattice point within distance from the target, where denotes the first minimum of the lattice.. References [21, 22] achieved a larger value , at the expense of a pre-processing stage where Gaussian samples are taken from the dual lattice with standard deviation equal to its smoothing parameter. However, sampling at the smoothing parameter is in general a difficult problem with no efficient solutions nowadays. For algorithms of general SVP/CVP such as enumeration and sieving, we refer the readers to the comprehensive survey [23].
Thirdly, we examine the impact of MCMC to trapdoor sampling in the GPV paradigm. In cryptographic applications, the standard deviation of the sampler is the main parameter governing the security level. Namely, the smaller , the higher security. This is because for a signature system to be secure, it must be hard for an adversary to find lattice points of length about . We show that, at moderate costs of increased complexity, MCMC is able to sample with smaller , thereby increasing the security level relative to Klein’s algorithm [11] and Peikert’s algorithm [16].
Finally, to improve the convergence rate of the Markov chain, the independent multiple-try Metropolis-Klein (MTMK) algorithm is proposed, which fully exploits the trial samples generated from the proposal distribution. Uniform ergodicity is demonstrated and the enhanced convergence rate is also given. Since independent MHK is only a special case of independent MTMK, the decoding performance can also be improved due to the usage of trial samples. The proposed sampling algorithm allows a parallel implementation and is easily adopted to MIMO detection to achieve near-optimal performance. With the development of 5G, the demand for large-scale MIMO systems will increase in the next decade, which has triggered research activities towards low complexity decoding algorithms for large-scale MIMO detection [24, 25, 26]. Therefore, there has been considerable interest in MCMC sampling for the efficient decoding of MIMO systems [27, 28, 29, 30, 31, 32].
The rest of this paper is organized as follows. Section II introduces the lattice Gaussian distribution and briefly reviews the basics of MCMC. In Section III, we derive the spectral gaps of the Markov chains associated with independent MHK and rejection sampling-based lattice Gaussian sampling, and show their uniform ergodicity as well as convergence rates. An extension to Peikert’s algorithm is also given. Then, the decoding complexity of BDD using independent MHK algorithm is derived in Section IV. Section V addresses trapdoor sampling using MCMC. In Section VI, the independent MTMK algorithm is proposed to further strength the convergence performance. Simulation results for MIMO detection are presented in Section VII. Finally, Section VIII concludes the paper.
Notation: Matrices and column vectors are denoted by upper and lowercase boldface letters, and the transpose, inverse, pseudoinverse of a matrix by and , respectively. denotes the identity matrix. We use for the th column of the matrix , for the th Gram-Schmidt vector of the matrix , for the entry in the th row and th column of the matrix . A symmetric matrix is written as if it is positive definite. Similarly, we say if . denotes rounding to the integer closest to . If is a complex number, rounds the real and imaginary parts separately. In addition, we use the standard small omega notation , i.e., for every fixed positive number . Finally, in this paper, the computational complexity is measured by the number of Markov moves.
II Preliminaries
In this section, we introduce the background and mathematical tools needed to describe and analyze the proposed lattice Gaussian sampling algorithms.
II-A Lattice Gaussian Distribution
Let matrix consist of linearly independent column vectors. The -dimensional lattice generated by is defined by
[TABLE]
where is called the lattice basis. We define the Gaussian function centered at for standard deviation as
[TABLE]
for all . When or are not specified, we assume that they are and respectively. Then, the discrete Gaussian distribution over is defined as
[TABLE]
for all , where is just a scaling to obtain a probability distribution. We remark that this definition differs slightly from the one in [8], where is scaled by a constant factor (i.e., ). In fact, the discrete Gaussian resembles a continuous Gaussian distribution, but is only defined over a lattice. It has been shown that discrete and continuous Gaussian distributions share similar properties, if the flatness factor is small [5].
II-B Decoding by Sampling
Consider the decoding of an real-valued system. The extension to the complex-valued system is straightforward [33]. Let denote the transmitted signal. The corresponding received signal is given by
[TABLE]
where is the noise vector with zero mean and variance , is an full column-rank matrix of channel coefficients. Typically, the conventional maximum likelihood (ML) reads
[TABLE]
where denotes the Euclidean norm. Clearly, ML decoding corresponds to the CVP. If the received signal is the origin, then ML decoding reduces to SVP.
Intuitively, the CVP given in (5) can be solved by the lattice Gaussian sampling. Since the distribution is centered at the query point , the closest lattice point to is assigned the largest sampling probability. Therefore, by multiple samplings, the solution of CVP is the most likely to be returned. It has been demonstrated that lattice Gaussian sampling is equivalent to CVP via a polynomial-time dimension-preserving reduction [34]. Meanwhile, by adjusting the sample size, the sampling decoder enjoys a flexible trade-off between performance and complexity.
In [15], Klein introduced an algorithm which performs sampling from a Gaussian-like distribution (see Algorithm 1). It is shown in [15, 33, 35] that Klein’s algorithm is able to find the closest lattice point when it is close to the input vector: this technique is known as BDD in coding literature, which corresponds to a restricted variant of CVP.
II-C Classical MH Algorithms
In [36], the original Metropolis algorithm was extended to a more general scheme known as the Metropolis-Hastings (MH) algorithm. In particular, let us consider a target invariant distribution together with a proposal distribution . Given the current state for Markov chain , a state candidate for the next Markov move is generated from the proposal distribution . Then the acceptance ratio is computed by
[TABLE]
and will be accepted as the new state with probability . Otherwise, will be retained. In this way, a Markov chain is established with the transition probability as follows:
[TABLE]
It is interesting that in MH algorithms, the proposal distribution can be any fixed distribution from which we can conveniently draw samples. Therefore, there is large freedom in the choice of but it is challenging to find a suitable one with satisfactory convergence. In fact, Gibbs sampling can be viewed as a special case of the MH algorithm, whose proposal distribution is a univariate conditional distribution.
As an important parameter to measure the time required by a Markov chain to get close to its stationary distribution, the mixing time is defined as [19]
[TABLE]
where denotes a row of the transition matrix for Markov moves and represents the total variation distance.
II-D Independent MHK Algorithm
From the MCMC perspective, lattice Gaussian distribution can be viewed as a complex target distribution lacking direct sampling methods. In order to obtain samples from , the independent MHK sampling was proposed in [18]. Specifically, a state candidate for the next Markov move is generated by Klein’s algorithm, via the following backward one-dimensional conditional sampling (for ):
[TABLE]
where , , and by QR decomposition with . Note that , , and are the segments of , and respectively (i.e., is a submatrix of with to in the diagonal).
Given the current state , the proposal distribution in the independent MHK sampling is given by
[TABLE]
where the proposal distribution is actually independent of . Therefore, the connection between two consecutive Markov moves is only due to the decision stage.
With the state candidate , the acceptance ratio is obtained by substituting (10) into (6)
[TABLE]
where and we note that in (6) (these notations will be followed throughput the context). The sampling procedure is summarized in Algorithm 2. Note that the initial state for can be chosen from arbitrarily or from the output of a suboptimal algorithm.
Thanks to the celebrated coupling technique, the uniformly ergodicity was demonstrated in [18]. Nevertheless, the spectral gap of the transition matrix, which serves as an important metric for the mixing time of the underlying Markov chain, has not been determined yet.
III Convergence Analysis
In this section, the spectrum of the Markov chain induced by independent MHK sampling is analyzed, followed by the extensions to Peikert’s algorithm and rejection sampling. As a common way to evaluate the mixing time, the spectral gap of the transition matrix is preferred for convergence analysis in MCMC [19]. Here, represents the second largest eigenvalue in magnitude of the transition matrix [37].
III-A Spectral Gap of Independent MHK Algorithm
Theorem 1**.**
Given the invariant lattice Gaussian distribution , the Markov chain induced by independent MHK sampling exhibits a spectral gap
[TABLE]
Proof.
From (10) and (11), the transition probability of each Markov move in the independent MHK sampling is given by
[TABLE]
For notational simplicity, we define the importance weight as
[TABLE]
Then the transition probability can be rewritten as
[TABLE]
Without loss of generality, we label the countably infinite state space as and assume that these states are sorted according to their importance weights, namely,
[TABLE]
From (15) and (16), the transition matrix of the Markov chain can be exactly expressed as
[TABLE]
where
[TABLE]
stands for the probability of being rejected in the decision stage with the current state for .
Let denote the vector of proposal probabilities. Then by decomposition, it follows that
[TABLE]
where and is an upper triangular matrix of the form
[TABLE]
It is well-known that for a Markov chain, the largest eigenvalue of the transition matrix always equals 1. Here, as is a common right eigenvector for both and , it naturally corresponds to the largest eigenvalue 1. Meanwhile, since the rank of is 1, the other eigenvalues of are exactly the same as those of .
Thanks to the ascending order in (16), it is easy to verify that the spectral radius is exactly given by
[TABLE]
and
[TABLE]
thereby raising the interest of identifying the value of .
Therefore, according to (17) and (19), we can easily get that
[TABLE]
In other words, the spectral gap is exactly captured by the ratio . Next, we invoke the following Lemma to lower bound the ratio for .
Lemma 1** ([18]).**
In the independent MHK algorithm
[TABLE]
for all , where is defined in (12).
The proof is completed by combining (21) and (22).
∎
By using the coupling technique, it is shown in [18] that the Markov chain converges exponentially fast to the stationary distribution in total variational distance:
[TABLE]
The mixing time of the Markov chain is given by
[TABLE]
which is proportional to , and becomes if .
III-B Extension to Peikert’s Algorithm
Klein’s sampling algorithm is a randomized variant of Babai’s nearest-plane algorithm for lattice decoding [38]. Babai also proposed a simpler decoding scheme by direct rounding222In communications, Babai’s nearest-plane algorithm is known as successive interference cancelation (SIC) while the direct rounding algorithm is referred to as zero-forcing (ZF)., which was further randomized by Peikert in [16]. Although Peikert’s algorithm requires a higher value of , it is parallelizable and can be more attractive in practical implementation. In fact, Peikert’s algorithm can also be incorporated into the Metropolis-Hastings algorithm to overcome the limitation of .
Specifically, given the standard deviation and a basis , one chooses a positive definite matrix for (i.e., is positive definite). Then, the proposed sample is taken from the distribution , where is sampled from the continuous distribution . Note the lattice Gaussian distribution is expressed as
[TABLE]
with
[TABLE]
The joint probability distribution of and is given by
[TABLE]
where (a) is due to the symmetry of , and (b) follows from [16, Fact 2.1] with positive definite matrix and . Consequently, the marginal distribution of is
[TABLE]
As for , we have
[TABLE]
Clearly, can be used as a proposal distribution in the MH algorithm to obtain the state candidate . In this case, the acceptance ratio can be calculated by
[TABLE]
followed by a decision to accept or not. To summarize, its operation procedure is shown in Algorithm 3.
Lemma 2**.**
In the independent MH algorithm using Peikert’s algorithm, there exists a constant such that
[TABLE]
for all , where
[TABLE]
Proof.
To start with, we have
[TABLE]
where inequality comes from the fact that .
The Lemma is proven by showing that
[TABLE]
Here, and follow from the properties of determinant that
[TABLE]
and
[TABLE]
respectively, for square matrices and of equal sizes.
∎
To satisfy the condition that , we require
[TABLE]
where denotes the largest singular value of the basis . It is readily verified that
[TABLE]
Lemma 3**.**
For independent MH samplings based on Peikert’s algorithm and on Klein’s algorithm, the following relation holds:
[TABLE]
Proof.
According to (12) and (32), in order to show , we need to prove that
[TABLE]
Next, by recalling the Jacobi theta function with , we have
[TABLE]
and the left-hand side of (40)
[TABLE]
where utilizes the symmetry property of Theta series for isodual lattice
[TABLE]
Moreover, as is monotone decreasing with , the following relation holds:
[TABLE]
Hence, we finally have that the left-hand side of (40)
[TABLE]
thus completing the proof.
∎
Similarly to independent MHK, it is easy to verify that the proposed algorithm is also uniformly ergodic.
Theorem 2**.**
Given , the Markov chain induced by independent MH sampling using Peikert’s algorithm converges exponentially fast:
[TABLE]
By Lemma 3, we can see that the independent MH sampling based on Peikert’s algorithm converges slower than that based on Klein’s algorithm. This is numerically confirmed in Fig. 2 for checkerboard lattice , where a comparison of the coefficients and is given. Clearly, in the whole range of , the independent MH-Peikert sampling requires more iterations than independent MHK.
III-C Extension to Rejection Sampling
The classic rejection sampling is able to generate independent samples from the target distribution, but requires a normalizing constant for the application of a proposal distribution [39]. Given the target distribution , its operation consists of the following three steps:
1) Generate a candidate sample from distribution using Klein’s algorithm or Peikert’s algorithm.
2) Calculate a normalizing constant such that
[TABLE]
for all .
3) Output with probability
[TABLE]
and otherwise repeat.
Generally, rejection sampling is not directly comparable with MCMC sampling as it requires the normalizing constant for calibrating, which is not realistic in many cases of interest. Nevertheless, with a certain choice of , it is possible to interpret it as a particular MCMC algorithm.
Definition 1**.**
Given the target distribution , the Markov chain arising from the above rejection sampler with for all is reversible, irreducible and aperiodic, with transition probability
[TABLE]
Clearly, the algorithm based on rejection sampling converges when the first acceptance takes place. The samples after the acceptance are naturally independently and identically distributed (i.i.d.). Similarly to the setting in (16), the transition matrix of this Markov chain is exactly given by
[TABLE]
which can be further decomposed into
[TABLE]
where
[TABLE]
[TABLE]
and
[TABLE]
Here, and denote the acceptance and rejection probabilities of a new candidate in the next move.
Similarly to the analysis of independent MHK, we have the following Lemma, whose proof is omitted due to simplicity.
Lemma 4**.**
The eigenvalues ’s of the transition matrix satisfy that
[TABLE]
with
[TABLE]
for .
Furthermore, we arrive at the following Theorem.
Theorem 3**.**
Given the invariant lattice Gaussian distribution , the Markov chain induced by rejection sampling converges exponentially fast as
[TABLE]
where the spectral radius .
Proof.
Let denote the number of acceptances during consecutive moves. Then
[TABLE]
[TABLE]
[TABLE]
where , , and has converged to after the first acceptance. ∎
According to Theorem 3, the convergence rate of rejection sampling depends on the choice of the normalizing constant . Because for all , the spectral radius of rejection sampling achieves the minimum when , namely,
[TABLE]
thus leading to
[TABLE]
From (23) and (59), it is worth noting that only when , rejection sampling and independent MH have the same convergence rate. However, the former requires the knowledge of while the latter does not.
Remark 1**.**
Another algorithm for lattice Gaussian sampling based on rejection sampling was proposed in [40]. However, it was only concerned with values of required by Klein’s algorithm. Its goal is to use rejection sampling to produce exact Gaussian samples, since Klein’s algorithm only approximates the target distribution. In contrast, our goal is to sample with smaller values of . The algorithm of [40] computes a certain normalizing constant in polynomial time and needs just a few steps on average to produce an exact sample. It is possible to extend their algorithm to smaller values of , but its running time will blow up.
IV Complexity of BDD
In this section, we apply the independent MHK sampling to BDD and analyze its complexity. The analysis for independent MH-Peikert and rejection sampling is similar, by changing the value . As mentioned before, the decoding complexity of MCMC is evaluated by the number of Markov moves.
In MCMC, samples from the stationary distribution tend to be correlated with each other. Thus one leaves a gap, which is the mixing time , to pick up the desired independent samples (alternatively, one can run multiple Markov chains in parallel to guarantee i.i.d. samples). Therefore, we define the complexity of solving BDD by MCMC as follows.
Definition 2**.**
Let denote the Euclidean distance between the query point and the lattice with basis , and let be the lattice point achieving . The complexity (i.e., the number of Markov moves ) of solving BDD by MCMC is
[TABLE]
Then, can be upper bounded by
[TABLE]
where
[TABLE]
Theorem 4**.**
The complexity of solving BDD by the independent MHK algorithm is bounded above as
[TABLE]
Proof.
To start with, let us examine the numerator in (62)
[TABLE]
where we apply the Jacobi theta function [41].
By substituting (64) to (62), the complexity is upper bounded as
[TABLE]
Now, let us recall some facts about Jacobi theta function . is monotonically decreasing with , and particularly
[TABLE]
By simple calculation, we can get that
[TABLE]
where stands for the Gamma function. Clearly, if
[TABLE]
it turns out that the following term
[TABLE]
is rather small even for values of up to hundreds (e.g., ). The key point here is that the pre-exponential factor is close to 1. For better accuracy, (or etc.) can be applied so that . More options about can be found in Table I.
Therefore, if satisfies the condition (68), namely
[TABLE]
then we have
[TABLE]
Setting , we finally arrive at the following result
[TABLE]
completing the proof. ∎
Let us highlight the significance of lattice reduction. Lattice reduction is able to significantly improve while reducing [42]. Therefore, increasing will significantly decrease the complexity shown above.
Remark 2**.**
In fact, such an analysis also holds for Klein’s algorithm, where the probability of sampling follows a Gaussian-like distribution[15]
[TABLE]
Klein chose , which corresponds to complexity. Here, we have shown that the decoding complexity can be further reduced to , by setting . With the help of HKZ reduction, [43]. Thus, Klein’s algorithm allows to solve the -BDD with in polynomial time, while our result shown in (72) improves it to .
According to (63), we have
[TABLE]
where . Clearly, the decoding radius increases with , implying a flexible trade-off between the decoding performance and complexity. In addition, the significance of lattice reduction can be seen due to an increased value of .
V Trapdoor Sampling
The core technique underlying GPV’s signature scheme is discrete Gaussian sampling over a trapdoor lattice [11]. Its security crucially relies on the property that the output distribution of discrete Gaussian sampling is oblivious to any particular basis used in the sampling process, therefore preventing leakage of the private key. The original GPV signature scheme was based on Klein’s algorithm, which was subsequently extended to Peikert’s algorithm [16] (see also [44, Chap. 6] for sampling over structured lattices). In fact, any good Gaussian sampling algorithms can be applied to GPV signatures. In this Section, we demonstrate the security advantage of MCMC in GPV signatures, thanks to smaller parameters it can reach.
Firstly, we provide a high-level introduction to the GPV signature (see [11, 16] for details). In key generation, one generates a hard public basis for a random lattice , together with a short private basis of . The public basis serves as the public key, while the private basis serves as the private key. Given a message (or rather a digest of ), one uses the private basis to sample a point from with parameter . The signature of is . The verifier checks that is short and that using the public basis.
It is shown in [11] that the security of GPV signing can be reduced to the hardness of the inhomogeneous short integer solution (ISIS) problem333In the language of coding theory, this is to find a short vector in a coset of a linear code. with approximation factor . Therefore, the width is the most important property of a discrete Gaussian sampler in this context.
Obviously, there is a tradeoff between security and running time in trapdoor sampling with MCMC. A small parameter gives higher security, but require longer running time. Next, we examine the impact of decreasing on the mixing time. Again, we focus on the independent MHK algorithm. Recall its the mixing time is proportional to
[TABLE]
Our intuition here is that if a good basis is available (as in the case of trapdoor sampling), then will not blow up as grows. To give an impression, Fig. 3 shows as a function of for checkerboard lattice with and dB, using its well-known basis [41, p.117, (86)]. It is seen that merely grows to 12 for up to 1000.
What if ? Then the denominator of (75) can be unpredictable in general. Fortunately, it can be bounded if is above the smoothing parameter. Recall that for a lattice and for , the smoothing parameter444Note again the difference from the definition in [8], where is scaled by a constant factor . is defined as the smallest such that . If , we have , .
Here, we are concerned with the parameter region , below GPV’s parameter [11] but above the smoothing parameter. This is because we anticipate only moderate growth in mixing time but significant increase of security for values of just below GPV’s parameter.
Let denote the subset of indexes with (i.e., ), , . It is not difficult to derive the following bound, similarly to [18, Proposition 4]:
[TABLE]
where we use the identity and assume in the second step, and in the last step.
Particularly, if for some , we derive
[TABLE]
Again, our key observation is that the mixing time grows rather slowly for values of that are not too small. For example, when , we have for . This means that with roughly iterations, our MCMC sampler is able to reduce the parameter from to . Therefore, if one is willing to use a slower signature scheme in return for higher security, MCMC offers such an option.
Example 1** (FALCON).**
FALCON [45] is a GPV signature scheme instantiated by NTRU lattices. Let be a power of two, where is Euler’s totient function, . The secret key consists of two polynomials and in ring where is invertible. Find and such that
[TABLE]
The NTRU lattice of dimension is generated by the private basis
[TABLE]
where denotes an nega-cyclic matrix whose first row consists of the coefficients of a polynomial. The public basis is given by
[TABLE]
where . Both bases and generate the same lattice
[TABLE]
We consider the parameters and in FALCON. The coefficients of polynomials and are randomly sampled from . For a particular instance randomly generated, we find . In Fig. 4, we show as a function of the term in (V), which characterizes the complexity above the smoothing parameter. It is seen that MCMC is able to significantly reduce the parameter , with quite moderate increase in complexity. Specifically, the term merely grows to about 20, even if is halved relative to . Recall that GPV sampling requires .
Note that it is possible for MCMC to incorporate the fast Fourier sampler [45], which would speed up the sampling process for structured lattices. The security levels of various samplers have been evaluated in [44, Chap. 6]. We leave evaluation of the concrete security of MCMC samplers to future work.
VI Multiple-Try Metropolis-Klein Algorithm
In this section, the independent multiple-try Metropolis-Klein (MTMK) algorithm is proposed to enhance the mixing. We firstly prove its validity and then show its uniform ergodicity with an improved convergence rate.
VI-A Multiple-Try Metropolis Method
Rather than directly generating the state candidate from the proposal distribution , the multiple-try Metropolis (MTM) method selects among a set of i.i.d. trial samples from , which significantly expands the searching region of proposals [46]. In particular, the MTM method consists of the following steps:
1) Given the current state , draw i.i.d. state candidates from the proposal distribution .
2) Select among {} with probability proportional to the weight
[TABLE]
where is a nonnegative symmetric function of and defined initially.
3) Draw i.i.d. reference candidates from the proposal distribution and let .
4) Accept as the state of , i.e., with probability
[TABLE]
otherwise, with probability , let .
By exploring the search region more thoroughly, an improvement of convergence can be achieved by MTM. Based on a number of trial samples generated from the proposal distribution, the Markov chain enjoys a large step-size jump within every single move without lowering the acceptance rate. It should be noticed that the reference samples ’s are involved only for the validity of MTM by satisfying the detailed balance condition [46]
[TABLE]
Clearly, the efficiency of MTM relies on the number of trial samples while the traditional MH sampling is a special case with . Similar to MH sampling, there is considerable flexibility in the choice of the proposal distribution in MTM [47]. Actually, it is even possible to use different proposal distributions to generate trial samples without altering the ergodicity of the Markov chain [48]. Meanwhile, the nonnegative symmetric function in (78) is also flexible, where the only requirement is that whenever .
VI-B The Proposed Algorithm
With the great flexibility offered by and , we now propose the independent multiple-try Metropolis-Klein (MTMK) algorithm, which is described by the following steps:
1) Given the current state , use Klein’s algorithm to draw i.i.d. state candidates from the independent proposal distribution in (10)
[TABLE]
2) Let . Then select among {} with probability proportional to the importance weight
[TABLE]
3) Accept as the state of with acceptance rate
[TABLE]
otherwise, with probability , let .
In the proposed algorithm, the basic formulation of MTM is modified in three aspects. First, Klein’s algorithm is applied to generate trial state candidates from the independent proposal distribution . Then, by setting , becomes the importance weight of that we have defined in (14). Finally and interestingly, thanks to the independent proposals, the generation of reference samples ’s can be removed without changing the ergodicity of the chain.
In the case of independent proposals, because both the trial samples ’s and the reference samples ’s are generated independently from the identical distribution , the generation of reference samples can be greatly simplified by just setting for and . Actually, with the same arguments, the trial samples generated in the previous Markov moves can also be used by [49].
It is well known that a Markov chain which is irreducible and aperiodic will be ergodic if the detailed balance condition is satisfied [19]. Since irreducible and aperiodic are easy to verify, we show the validity of the proposed algorithm by demonstrating the detailed balance condition.
Theorem 5**.**
Given the target lattice Gaussian distribution , the Markov chain induced by the independent MTMK algorithm is ergodic.
Proof.
To start with, let us specify the transition probability of the underlying Markov chain. For ease of presentation, we only consider the case of , since the case is trivial. The transition probability can be expressed as
[TABLE]
Here, represents the probability of accepting as the new state of given the previous one when the th candidate among ’s is selected. Moreover, as is exchangeable and independent, it follows that by symmetry, namely,
[TABLE]
In contrast to MH algorithms, the generation of the state candidate for Markov move in MTM actually follows a distribution formed by and together [46]. More precisely, can be further expressed as (86), where the terms inside the sum correspond to , and respectively.
From (86), it is straightforward to verify the term is symmetric in and , namely
[TABLE]
Then, by simple substitution, the detailed balance condition is satisfied as
[TABLE]
completing the proof. ∎
VI-C Convergence Analysis
Theorem 6**.**
Given the invariant lattice Gaussian distribution , the Markov chain induced by the independent MTMK sampling algorithm converges exponentially fast to the stationary distribution:
[TABLE]
*with *
[TABLE]
The proof of Theorem 6 is provided in Appendix A.
From (90), it can be observed that with the increase of the trial sample size , the exponential decay coefficient will approach 1. In other words, with a sufficiently large , sampling from the target distribution can be realized efficiently. More importantly, the generation of trial samples at each Markov move not only allows a fully parallel implementation, but also can be carried out in a preprocessing stage, which is beneficial in practice.
Now, given , the mixing time of the underlying Markov chain can be estimated. Specifically, according to (8) and (89), we obtain
[TABLE]
where we again use the bound for in . Clearly, the mixing time is proportional to , and becomes if . Overall, compared with the mixing time given in (24), the mixing time of the independent MTMK is significantly reduced by a factor of . Since the independent MTMK inherits all the formulations of the independent MHK, we have
[TABLE]
for .
Following the afore-mentioned derivation, the decoding radius of the independent MTMK algorithm can be easily obtained as
[TABLE]
Remark 3**.**
Although the independent MTMK algorithm is able to reduce the mixing time, its complexity in each move increases due to multiple calls of trial samples. Therefore, parallel implementation or preprocessing is highly desired to ease the complexity burden.
Moreover, it is possible to have a varying at each Markov move, thereby resulting in an adaptive independent MTMK algorithm as
[TABLE]
where and denotes the size of trial samples at each Markov move [50].
VII Experiments of MIMO Detection
In this section, performance of the MCMC decoding algorithms is evaluated in MIMO detection. Specifically, we present simulation results for an MIMO system with a square channel matrix. Here, the th entry of the transmitted signal , denoted as , is a modulation symbol taken independently from an -QAM constellation with Gray mapping. Meanwhile, we assume a flat fading environment, where the channel matrix contains uncorrelated complex Gaussian fading gains with unit variance and remains constant over each frame duration. Let represents the average power per bit at the receiver, then the signal-to-noise ratio (SNR) where is the modulation level and is the noise power. Then, we can express the system model as
[TABLE]
Typically, in the case of Gaussian noise with zero mean and variance , it follows from (72) that
[TABLE]
as by the law of large numbers. Therefore, the decoding complexity decrease with the SNR. Note that the noise variance is different from the standard deviation of the lattice Gaussian distribution555In [27], the noise variance is used as the sampling variance, but this would lead to a stalling problem at high SNRs [51]..
On the other hand, soft-output decoding for MIMO bit interleaver coded modulation (BICM) system is also possible using the samples generated by MCMC. Specifically, the sample candidates can be used to approximate the log-likelihood ratio (LLR), as in [52]. For bit , the approximated LLR is computed as
[TABLE]
where is the th information bit associated with sample . The notation means the set of all vectors for which .
Fig. 5 shows the bit error rate (BER) of MCMC decoding in a uncoded MIMO system with 16-QAM, where all the samples generated by MCMC algorithms are taken into account for decoding. This corresponds to a lattice decoding scenario with dimension . The performance of zero-forcing (ZF) and maximum-likelihood (ML) decoding are shown as benchmarks. For a fair comparison, sequential Gibbs sampling is applied here, which performs 1-dimensional conditional sampling of in a backward order666A forward update of in sequential Gibbs sampling is also possible., completing a full iteration [27]. This corresponds to one Markov move in the independent MHK and MTMK algorithms, which also update components of in one iteration.
As expected, with Markov moves (i.e., iterations), independent MHK outperforms Gibbs sampling. As has a vital impact on the sampling algorithms, Gibbs sampling is illustrated by tuning with different values. Note that the detection performance may be affected due to the finite constellation. Furthermore, as shown in (74), under the help of LLL reduction, the decoding radius of the independent MHK sampling is significantly strengthened by a larger size of , thereby leading to a much better decoding performance. As a comparison, LLL reduction is applied in Gibbs sampling as a preprocessing stage to yield the high quality initial starting point. Additionally, compared to independent MHK, further decoding gain can be obtained by the independent MTMK algorithm, where cases with and trial samples are illustrated respectively.
On the other hand, in Fig. 6, the BERs of MCMC sampling detectors are evaluated against the number of Markov moves (i.e., iterations) in a uncoded MIMO system with 16-QAM. The SNR is fixed as dB. Clearly, the performances of all the MCMC detectors improve with the number of Markov moves. Meanwhile, with the same number of Markov moves, a substantial performance gain is obtained by LLL reduction. By increasing number of trial samples, better decoding performance can be obtained by the independent MTMK algorithm due to a larger decoding radius shown in (93).
VIII Conclusions
In this paper, the MCMC-based lattice Gaussian sampling was studied in full details. The spectral gap of the transition matrix of the independent MHK algorithm was derived and analyzed, which leads to a tractable exponential convergence rate of the Markov chain. A comparison with the extensions to Peikert’s algorithm and rejection sampling illustrated the advantages of independent MHK. With the tractable mixing time, the decoding complexity of BDD using MCMC was derived and a trade-off between the decoding radius and complexity was established. The potential of MCMC was further demonstrated in trapdoor sampling. After that, by exploiting the potential of trial samples, the independent MTMK algorithm was proposed to enhance the convergence. It supports parallel implementation due to the independent proposal distribution, thus making independent MTMK algorithm promising in practice.
Acknowledgment
The authors would like to thank Dr. Thomas Prest for helpful discussions.
Appendix A Proof of Theorem 6
Proof.
To begin with, let us take a careful look at the term from (86). Here, for ease of presentation, we define
[TABLE]
and
[TABLE]
Meanwhile, because the trial samples from the proposal distribution are independent of each other, a set is defined which contains the trial samples , .
Then we can express and as
[TABLE]
and
[TABLE]
Here, represents a probability distribution that takes all , into account as a whole. On the other hand, and stand for the functions about , namely,
[TABLE]
and
[TABLE]
where
[TABLE]
Now, let us focus on the term , and we arrive at
[TABLE]
Here, represents the expectation of function while is sampled from the distribution , comes from the Jensen’s inequality in the multi-variable case. Moreover, thanks to the independent samples from , follows the derivations shown below,
[TABLE]
Similar to , we can rewrite as
[TABLE]
Therefore, from (105) and (107), we get
[TABLE]
where and
[TABLE]
for from (22) in Lemma 1. From (108), it is straightforward to see that all the Markov transitions have a component of size in common. Then, uniform ergodicity can be easily demonstrated through spectral gap or coupling technique, which is omitted here for simplicity.
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W. Banaszczyk, “New bounds in some transference theorems in the geometry of numbers,” Math. Ann. , vol. 296, pp. 625–635, 1993.
- 2[2] G. Forney and L.-F. Wei, “Multidimensional constellations–Part II: Voronoi constellations,” IEEE J. Sel. Areas Commun. , vol. 7, no. 6, pp. 941–958, Aug. 1989.
- 3[3] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling for Gaussian channels,” IEEE Trans. Inform. Theory , vol. 39, pp. 913–929, May. 1993.
- 4[4] C. Ling and J.-C. Belfiore, “Achieiving the AWGN channel capacity with lattice Gaussian coding,” IEEE Trans. Inform. Theory , vol. 60, no. 10, pp. 5918–5929, Oct. 2014.
- 5[5] C. Ling, L. Luzzi, J.-C. Belfiore, and D. Stehlé, “Semantically secure lattice codes for the Gaussian wiretap channel,” IEEE Trans. Inform. Theory , vol. 60, no. 10, pp. 6399–6416, Oct. 2014.
- 6[6] H. Mirghasemi and J. C. Belfiore, “Lattice code design criterion for mimo wiretap channels,” in Proc IEEE Information Theory Workshop (ITW) , Oct 2015, pp. 277–281.
- 7[7] S. Vatedka, N. Kashyap, and A. Thangaraj, “Secure compute-and-forward in a bidirectional relay,” IEEE Transactions on Information Theory , vol. 61, no. 5, pp. 2531–2556, May 2015.
- 8[8] D. Micciancio and O. Regev, “Worst-case to average-case reductions based on Gaussian measures,” in Proc. Ann. Symp. Found. Computer Science , Rome, Italy, Oct. 2004, pp. 372–381.
