Eficient Monte Carlo Simulation of the Left Tail of Positive Gaussian Quadratic Forms
Chaouki Ben Issaid, Mohamed-Slim Alouini, Raul Tempone

TL;DR
This paper introduces an efficient importance sampling method for accurately estimating extremely small probabilities in the left tail of Gaussian quadratic forms, significantly reducing computational effort.
Contribution
It presents a novel importance sampling estimator with bounded relative error for Gaussian quadratic forms, improving efficiency over naive Monte Carlo methods.
Findings
Estimator achieves bounded relative error.
Significantly fewer simulation runs needed.
Effective in both real and complex, central and non-central cases.
Abstract
Estimating the left tail of quadratic forms in Gaussian random vectors is of major practical importance in many applications. In this paper, we propose an efficient and robust importance sampling estimator that is endowed with the bounded relative error property. This property significantly reduces the number of simulation runs required by the proposed estimator compared to naive Monte Carlo. Thus, our importance sampling estimator is especially useful when the probability of interest is very small. Selected simulation results are presented to illustrate the efficiency of our estimator compared to naive Monte Carlo in both central and non-central cases, as well as both real and complex settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Approximation and Integration · Statistical Methods and Inference · Matrix Theory and Algorithms
Efficient Monte Carlo Simulation of the Left Tail of Positive Gaussian Quadratic Forms
Chaouki Ben Issaid
Mohamed-Slim Alouini
Raúl Tempone
King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
Alexander von Humboldt Professor in Mathematics for Uncertainty Quantification, RWTH Aachen University, 52062 Aachen, Germany
Abstract
Estimating the left tail of quadratic forms in Gaussian random vectors is of major practical importance in many applications. In this paper, we propose an efficient and robust importance sampling estimator that is endowed with the bounded relative error property. This property significantly reduces the number of simulation runs required by the proposed estimator compared to naive Monte Carlo. Thus, our importance sampling estimator is especially useful when the probability of interest is very small. Selected simulation results are presented to illustrate the efficiency of our estimator compared to naive Monte Carlo in both central and non-central cases, as well as both real and complex settings.
keywords:
Importance sampling, left tail, positive quadratic forms, Gaussian random vectors, bounded relative error.
1 Introduction
Quadratic forms can appear when the effect of inequality between errors in terms of variance and correlation is examined in a two-way analysis of variance [1], when the constrained least-squares estimator is studied [2], and during statistical hypothesis testing. Many test statistics,such as the test statistics in covariance structure analysis [4], and the general likelihood ratio statistic [5], can be expressed in terms of quadratic forms. These tests have a wide range of applicability. For instance, the multilocus association test for the genetic dissection of complex diseases in genetic studies [6], the spectral analysis of the Wishart ensemble or counting string vacua in fields such as string theory [7], and non-coherent detection [8]-[9] and combining diversity [10, Chap. 14] in communication theory.
Gurland investigated the distribution of quadratic forms and ratios of quadratic forms [14, 15]. The author presented these distributions in terms of an infinite sum involving Laguerre Polynomials provided that the semi-moments are known. However, Gurland’s expression of the coefficients is not very suitable for computation, and the estimate of the truncation error is given under certain assumptions. Other works, such as Ruben [16, 17] and Shah [18, 19], presented the distribution in terms of MacLaurin series or densities. As noted by Shah [19], most of these expansions are not practical, or some of them do not provide an estimate of the truncation errors. Kotz et al. unified these approaches in their works [20, 21] and derived a series representation for both the central and non-central cases.
In general, the exact distribution of a linear combination of independent chi-square variates is a challenging task. In fact, various approximations, have been proposed in the literature, for example, Imhof [22], Davies [23], and Solomon and Stephens [24]. Based on the work of Imhof [22], Davies [23] presented a numerical approach to inverting the characteristic function of a real random variable with the aim of computing its left tail. The author showed that the method accurately produces the distribution of a central chi-squared random variable for various numbers of degrees of freedom. Rice (1980) [3] presented a generalization of this approach, including two numerical integration methods of inverting the characteristic function. As Bausch commented [7], when the probability of interest is very small, most of the existing methods fail to give an accurate result.
It is widely known that when the number of simulation runs of the model is limited and the probability is small (i.e., the occurrence of an event is rare), the naive Monte Carlo (MC) estimator is expensive. As an alternative, we propose in this work an efficient importance sampling (IS) estimator to estimate the probability of interest. The IS method is often used to estimate rare events probabilities. By modifying the dynamics of the simulations, i.e. by introducing a new change of probability measure, the rare event is no longer rare. Therefore, the IS method aims to reduce the number of required simulation runs given a certain confidence interval. However, proposing a poor choice of the new PDF will lead to a large likelihood ratio and, thus, to an estimator with a variance greater than the original MC estimator. That being said, we note that constructing a good biased distribution is the core of importance sampling. To the best of our knowledge, only two works proposing IS schemes for the purpose of computing tails of quadratic forms in Gaussian random vectors have been previously published [25, 26]. In those works, the authors were interested in the right tail and implemented IS combined with the cross-entropy method. However, the authors did not provide any efficiency analysis of their estimators. In this work, we are interested in estimating the left tail of positive quadratic forms in Gaussian random vectors using IS. We also show that the proposed IS scheme is endowed with the bounded relative error property.
The rest of this paper is organized as follows. First, we briefly describe the problem setting, and provide a lower bound for the probability of interest. In Section 2, we prove the efficiency of our proposed estimator. After reviewing some basic notions of IS in Section 3, we present our approach to estimating the probability of interest in Section 4. We show some simulation results related to selected example of applications prior to concluding the paper. In each example, we compare the computational efficiency of our approach to that of naive MC.
2 Problem Setting
Let be a Gaussian random vector with PDF
[TABLE]
where is the mean vector, is the covariance matrix, assumed to be strictly positive definite, and represents the determinant of a matrix. For a given positive definite matrix and a threshold , we aim to introduce an efficient IS scheme for computing the left tail of the quadratic form , i.e.,
[TABLE]
as .
Before giving a lower bound on the probability , we re-write its expression more conveniently. First, we write , where is a standard Gaussian vector. Then, we have [11, Ch. 3]
[TABLE]
where .
Note that is a symmetric matrix, thus using the spectral theorem, there exists an orthogonal matrix and a diagonal matrix such that . Now, let , then, we have , since the matrix is a positive definite matrix. Therefore, the eigenvalues are non-negative. Going back to (2) and replacing by its spectral decomposition, we get
[TABLE]
where and . We note that is a Gaussian random vector with zero mean and a covariance matrix because the matrix is orthogonal, i.e., , where is the identity matrix. Now, we return to the probability of interest and re-write it as [11, Ch. 3]
[TABLE]
where are independent Gaussian RVs with zero mean and unit variance. At a higher level of abstraction, this amounts to determining the CDF of a linear combination of non-central chi-squared RVs with degree 1. In the remainder of this paper, we consider the above expression of . The following proposition gives a lower bound on .
Remark 1**.**
If the matrix is positive semi-definite, then is also positive semi-definite. Without loss of generality, we assume that the non-zero eigenvalues are , where . In this case, the probability is
[TABLE]
In the rest of this paper, we assume that , i.e., the positive definite case, but the same reasoning applies when we simply replace with in the positive semi-definite case.
Proposition 1**.**
Let be a Gaussian random vector with mean and covariance matrix , and let be a given matrix. For a fixed threshold , we have
[TABLE]
where is the generalized Marcum-Q function defined as [12, Eq.(2)]
[TABLE]
Proof.
In this proof, we consider the expression of
[TABLE]
We have
[TABLE]
Using the independence of and, thus, the independence of , we can write
[TABLE]
where is the CDF of the RV , . This corresponds to the CDF of a non-central Chi-squared RV with 1 degree of freedom. Therefore, we can write
[TABLE]
∎
3 Review of IS
Let denote the PDF of ; then, we can write , where is the indicator function and is the expectation w.r.t. the probability measure under which the PDF of is . Therefore, the naive MC estimator of is given by
[TABLE]
where is the number of MC samples, and are i.i.d. realizations of the RV . For each realization of , the sequence is sampled independently according to the PDF .
When dealing with rare event simulations, reducing the variance of the estimator of the quantity of interest causes the number of simulation runs required to achieve a certain accuracy level to become smaller. The IS method is a variance reduction technique that can be used to evaluate the probability of rare events. The core of the IS method is to propose the correct new PDF so that the new estimator has a smaller variance. More specifically, we can re-write as
[TABLE]
where is the expectation w.r.t. the probability measure under which the PDF of is the biased PDF , and is the likelihood ratio defined as
[TABLE]
Since are independent, we can write the likelihood ratio in terms of the marginal PDFs of , i.e.,
[TABLE]
Thus, the IS estimator of is given by
[TABLE]
where the sequence is sampled according to the biased PDF for each realization . The likelihood term can be interpreted as a weight that corrects the bias caused by the sampling from the biased PDF . In fact, it see that the IS estimator of is unbiased.
The efficiency of the proposed IS estimator compared to naive MC can be measured by many criteria. When it comes to evaluating very low probabilities, IS estimators endowed with the bounded relative error property are desirable. A naive MC estimator requires a number of samples that grows as . To achieve the same accuracy, the number of simulation runs needed by an IS estimator with a bounded relative error remains bounded, independently of . Mathematically speaking, we say that the IS estimator satisfies the bounded relative error criterion if the following statement holds
[TABLE]
To have a clear idea of the gain that the proposed IS estimator achieves compared to the naive MC one, we determine the number of simulation runs required by both estimators when the accuracy requirement is fixed, e.g., when the relative error of both estimators is assumed to be the same. We start by defining the relative error of both estimators as
[TABLE]
where we take , which corresponds to a confidence interval, and denotes the variance w.r.t. the probability measure under which the PDF of Z is .
4 Proposed IS Scheme
4.1 Real Valued Case
Our approach consists of shifting the mean and scaling the variance of each variate so that the marginal biased PDF is written as
[TABLE]
While the original PDF of , , is a standard Gaussian, the biased PDF corresponds to a Gaussian with mean and variance . In our approach, we choose the parameter , hoping that the event of interest becomes no longer rare. A possible solution is to look for in the form , where is a positive parameter such that the mean of under the biased PDF is equal to . That is,
[TABLE]
Using the linearity of the expected value and the fact that, under the new probability measure, are zero mean Gaussian RVs with variance , we get
[TABLE]
The above value of is clearly non-negative since the eigenvalues are all non-negative. As the threshold approaches zero, the values of become smaller, leading to the reduction of the variance of the IS estimator. Defining the biased PDFs using the values of obtained in (23), we show that our proposed IS estimator satisfies the bounded relative error property.
Proposition 2**.**
Let the marginal biased PDFs be defined as in (21), and as in (23). Then, the IS estimator (17) of the probability , given by (2), satisfies
[TABLE]
Proof.
We recall the definition of the likelihood ratio,
[TABLE]
A trivial upper bound for the likelihood ratio is given by
[TABLE]
With the choice of given in (23), we get
[TABLE]
Using the above upper bound of the likelihood ratio, we write
[TABLE]
Thus, we obtain the upper bound
[TABLE]
From Proposition 1, we have
[TABLE]
Using [12, Eq.(8)], we have the asymptotic expansion around of , i.e.,
[TABLE]
Therefore, as , we have
[TABLE]
and we can write
[TABLE]
By combining (29) and (33), we obtain
[TABLE]
∎
Remark 2**.**
If is zero, then is also zero. In this case, we use the same IS scheme, i.e., we introduce the biased PDF to be Gaussian with zero mean and variance . In the proof, we use the following lower bound, derived using similar reasoning but involving the CDFs of central Chi-squared RVs
[TABLE]
where is the lower incomplete Gamma function defined as [13, Eq. (8.350.1)]
[TABLE]
As , we have the upper bound
[TABLE]
The proposed IS estimator also has the bounded relative error property in this case, since
[TABLE]
4.2 Complex Valued Case
In this section, we briefly show how the proposed approach is still valid even if we consider the complex case for which the probability is
[TABLE]
where now is a complex Gaussian random vector, is its conjugate transpose, and is a Hermitian positive definite matrix. The complex setting is of paramount importance in many applications involving wireless techniques [31], [29], [30], [32], [28]. Using similar arguments, we write
[TABLE]
where is the module of a complex number. Using the fact that the random variable has a central Chi-squared distribution with 2 degrees of freedom, we obtain the bound
[TABLE]
We write both the original and biased PDF of , , in the complex scenario as
[TABLE]
Using a similar manipulation to the real-valued case, and using the same expression of as in (23), we get
[TABLE]
Thus, we can say that the proposed IS estimator maintains the bounded relative error in the complex-valued case since
[TABLE]
Remark 3**.**
The upper bound for the relative error of both the real-valued and complex-valued cases suffers from an exponential deterioration w.r.t. , i.e. the size of the Gaussian RV .
5 Numerical Examples
To show the accuracy and efficiency of the proposed IS scheme compared to naive MC, we consider three examples. The first example is a toy example where we compute the probability for a given scenario. In this example, we consider real Gaussian random vectors and the non-zero mean case. The second example is inspired by the wireless communication field. We estimate the outage probability of diversity receivers over correlated Gamma fading channels. This case corresponds to the real case, but with zero-mean Gaussian random vectors. In the third example, we show how the IS approach can be extended to the complex case by estimating the outage probability of diversity receivers over correlated Rician fading channels.
5.1 Example 1: Toy Example
5.1.1 Problem Setup
In this example, we compute the probability when the matrix is defined as
[TABLE]
. The mean and covariance matrix of the random vector are given by
[TABLE]
. To have evaluate the efficiency of the proposed IS scheme, we introduce a metric that measures the improvement in terms of the number of simulation runs. The efficiency indicator, , is defined as
[TABLE]
For a fixed number of simulation runs , this metric can also be interpreted as a measure of variance reduction.
5.1.2 Results and Discussions
In Table 1, we provide a comparison between the efficiencies of the MC and IS estimators of . For a specific range of the threshold , we compute the MC and IS estimates, their relative errors, and the efficiency indicator, for the same number of simulation runs .
For relatively high values of , the IS and MC estimates match. However, as the threshold becomes smaller, i.e., the probability of interest becomes smaller, the MC estimate becomes less accurate unlike the proposed IS scheme. While both estimates are built using the same number of simulation runs, the IS relative error remains bounded no matter how small the probability becomes. On the other hand, MC relative error tends to increase as the threshold decreases, which is consistent with the observation that the MC method becomes less accurate when estimating rare events. Finally, we quantify the gain in terms of the number of simulation runs (or, equivalently, in terms of variance reduction) by considering the efficiency indicator. We observe that the smaller the probability becomes, the larger the gain (or, equivalently, the smaller the reduction) becomes, reflecting the efficiency of the proposed scheme compared to naive MC. In fact, when the probability of interest is of the order of , a gain in terms of the number of simulation runs of the order of is achieved. This gain tends to increase as the probability becomes smaller since naive MC requires more samples to estimate the probability than the proposed IS estimator. In contrast, the bounded relative error property of the IS estimator causes the number of required simulation runs to remain almost constant no matter how small the probability becomes.
5.2 Example 2: Maximum Ratio Combining Over Correlated Nakagami-m Fading Channel
5.2.1 Problem Setup
In general, the problem of finding the left tail of quadrature form in Gaussian random vectors has many applications in the wireless communication filed [33], [34]. To combat the attenuation of the received signal in wireless communication systems, different diversity techniques can be used. In fact, there are in particular more or less complex linear combination techniques which make it possible to recover a signal with a good average level. Among these diversity techniques, we find the maximum ratio combining (MRC) technique. The MRC technique improves the average power of the output signal by forming it from the maximum signal of all the branches. The instantaneous signal-to-noise ratio (SNR) expression at the MRC diversity receiver, is given by
[TABLE]
where is the average SNR at each branch and where, for each , follows a Gamma distribution with PDF
[TABLE]
where are respectively the shape and scale parameters, respectively, of the PDF, and is the Gamma function defined as [13, Eq. (8.310.1)]
[TABLE]
Assuming that the shape parameter is a multiple of 0.5, then we can write
[TABLE]
where , are independent zero-mean Gaussian RVs.
To quantify the quality of a communication system, we compute a metric called the outage probability. Depending on the transmission technique used and the channel over which the signal is transmitted, this metric measures the probability that the instantaneous SNR drops below a certain threshold , i.e.,
[TABLE]
In the MRC scenario, the outage probability can be written as
[TABLE]
where . To facilitate the modeling of the channel correlation, we introduce the () vector . The joint pdf of the Gaussian vector is
[TABLE]
Thus, we re-write the outage probability as in (2) where is the identity matrix of order and . In other words, the outage probability is given by
[TABLE]
5.2.2 Results and Discussions
In this section, we discuss the efficiency of the proposed IS estimator for the purpose of computing the outage probability of MRC diversity receivers over correlated Gamma fading channels, and we consider the parameter . In Fig. 1, we plot the outage probability of -branch MRC diversity receivers over the correlated Gamma fading model as a function of the threshold for different numbers of branches . The number of simulation runs required to construct the naive MC estimator and the proposed IS estimator are and , respectively. While naive MC presents an accurate estimate for relatively high values of the outage probability, it tends to become erroneous as the outage probability becomes smaller. Although the proposed IS scheme is constructed using fewer of simulation runs, i.e. compared to , it provides an accurate estimate for the outage probabilities even when the probability is very small. We also observe that the outage probability becomes smaller as increases, a well-known observation when using diversity techniques.
To compare the efficiency of both methods, we investigate in Fig. 2 the number of simulation runs required to achieve a relative error for -branch diversity receivers over the correlated Gamma fading model for the three different numbers of branches. For this fixed accuracy requirement, we can see that the number of required simulation runs by naive MC tends to increase rapidly while it remains bounded for the proposed IS scheme. In fact, for dB and (which corresponds to a probability of the order of ), we already observe a reduction of the order of simulation runs for an accuracy requirement is set to a level of . In Fig. 3, we plot for a fixed number of branches , the plot of the outage probability estimates along with its error bars. As in the previous example, the error bars seem to have the same magnitude for the proposed IS scheme, and they are relatively small. The error bars of the naive MC tend to increase as the probability becomes smaller, indicating that the naive MC become less accurate as the events become more rare.
5.3 Example 3: Maximum Ratio Combining Over Correlated Rician Fading Channel
5.3.1 Problem Setup
In this example, we aim to estimate the outage probability of MRC diversity receivers over correlated Rician fading channels. We assume that the fading at each branch follows a Rice distribution with factor . The instantaneous SNR can be expressed as , where is a circularly symmetric complex Gaussian random vector with mean and covariance matrix . The expression of the mean and covariance matrix of can be expressed in terms of the Rician factors as [28]
[TABLE]
where is the correlation matrix of . In this example, we will assume that the correlation matrix has an exponential structure [27], i.e., , .
The outage probability, for a given threshold , is given by
[TABLE]
We can clearly see that this corresponds to the left tail of a complex Gaussian quadratic form with the matrix .
5.3.2 Results and Discussions
We start by plotting the outage probability of MRC diversity receivers over the correlated Rician fading model as a function of the threshold for different values of and for two values of the number of branches (Fig. 4) and (Fig. 6). The case indicates a low correlation between branches while means a high correlation between branches. In these plots, the solid line corresponds to estimates obtained using our proposed IS method, and the dashed one using naive MC. As we can observe from both plots, we can clearly see that although using fewer simulation runs ( less), the proposed IS method accurately estimates the outage probability, unlike naive MC, which fails as the probability becomes smaller (below ).
For both numbers of branches , we plot the required number of simulation runs for a accuracy requirement in Fig. 5 and Fig. 7, respectively. From these plots, we can confirm again that the proposed IS scheme outperforms naive MC. In fact, for a fixed threshold , we notice that the number of simulation runs of IS is far less than the one needed by naive MC to achieve the same accuracy. We also note that while the number of samples of naive MC continues to increase at a high rate as the probability becomes smaller, the number of samples required by the IS estimator remains almost constant as a result of the bounded relative error property.
In Fig. 8, we plot the outage probability of 2-and 4-branch MRC receivers and their error bars to determine the relative error of both methods. We can clearly see that the magnitude of the error bars is larger for naive MC than for the proposed IS scheme, which indicates that the relative error of naive MC tends to increase despite using more simulation runs compared to our IS estimator.
6 Conclusion
In this paper, we proposed efficient IS estimators for the left tail of the positive quadratic form in Gaussian random vectors. We discussed the construction of these estimators in both central and non-central cases, as well as both real and complex settings. We showed that these estimators are endowed with the bounded relative error property, making them an appealing alternative to naive MC, especially when the probability of interest is very small. To validate our results, we presented three examples, a toy example and two examples motivated by wireless communication theory. A clear gain in terms of simulation runs was observed from the numerical simulations, confirming the efficiency of our scheme compared to naive MC. However, we should note that the upper bound of the relative error of the proposed IS scheme became loose as the dimension of the problem at hand becomes larger, in both the real-valued and complex-valued cases.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. E. P. Box, Some theorems on quadratic forms applied in the study of analysis of variance problems, ii. effects of inequality of variance and of correlation between errors in the two-way classification, Ann. Math. Statist. 25 (3) (1954) 484–498.
- 2[2] F. Hsuan, P. Langenberg, A. Getson, The 2-inverse with applications in statistics, Linear Algebra and its Applications 70 (1985) 241–248.
- 3[3] S. O. Rice, Distribution of quadratic forms in normal random variables—evaluation by numerical integration, SIAM J. Sci. Stat. Comput. 1 (1980) 438–448.
- 4[4] A. Shapiro, Asymptotic distribution theory in the analysis of covariance structures, South African Statistical Journal 17 (1) (1983) 33–81.
- 5[5] Q. H. Vuong, Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica 57 (2) (1989) 307–333.
- 6[6] L. Tong, J. Yang, R. S. Cooper, Efficient calculation of p-value and power for quadratic form statistics in multilocus association testing, Annals of Human Genetics 74 (3) (2010) 275–285.
- 7[7] J. Bausch, On the efficient calculation of a linear combination of Chi-square random variables with an application in counting string vacua, Journal of Physics A: Mathematical and Theoretical 46 (50) (2013) 505202.
- 8[8] D. Divsalar, M. K. Simon, M. Shahshahani, The performance of trellis-coded mdpsk with multiple symbol detection, IEEE Transactions on Communications 38 (9) (1990) 1391–1403.
