Cram\'er Type Moderate Deviations for Random Fields
Aleksandr Beknazaryan, Hailin Sang, Yimin Xiao

TL;DR
This paper investigates Cramér type moderate deviations for partial sums of random fields, utilizing the conjugate method, with applications to linear random fields and nonparametric regression errors.
Contribution
It introduces new results on moderate deviations for random fields, extending classical theory to complex dependence structures and practical regression models.
Findings
Established Cramér type moderate deviation results for linear random fields.
Extended applicability to nonparametric regression with random field errors.
Demonstrated the effectiveness of the conjugate method in this context.
Abstract
We study the Cram\'er type moderate deviation for partial sums of random fields by applying the conjugate method. The results are applicable to the partial sums of linear random fields with short or long memory and to nonparametric regression with random field errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Cramér Type Moderate Deviations for Random Fields
Aleksandr Beknazaryana, Hailin Sanga and Yimin Xiaob
a Department of Mathematics, The University of Mississippi, University, MS 38677, USA. E-mail: [email protected], [email protected]
b Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA. E-mail: [email protected]
**Abbreviated Title: **Moderate deviations for random fields
Abstract
We study the Cramér type moderate deviation for partial sums of random fields by applying the conjugate method. The results are applicable to the partial sums of linear random fields with short or long memory and to nonparametric regression with random field errors.
Keywords: Cramér type moderate deviation, long range dependence, nonparametric regression, spacial linear process, random field.
MSC 2010 subject classification: 60F10, 60G60, 62E20
1 Introduction
In this paper we study the Cramér type moderate deviations for random fields, in particular linear random fields (often called spatial linear processes in statistics literature) with short or long memory (short or long range dependence). The study of moderate deviation probabilities in non-logarithmic form for independent random variables goes back to 1920s. The first theorem in this field was published by Khinchin (1929) who studied a particular case of the Bernoulli random variables. In his fundamental work, Cramér (1938) studied the estimation of the tail probability by the standard normal distribution under the condition that the random variable has moment generating function in a neighborhood of the origin (cf. (3) below). This condition has been referred to as the Cramér condition. Cramér’s work was improved by Petrov (1954) (see also Petrov (1975, 1995)). Their works have stimulated a large amount of research on moderate and large deviations; see below for a brief (and incomplete) review on literature related to this paper. Nowadays, the area of moderate and large deviation deviations is not only important in probability but also plays an important role in many applied fields, for instance, the premium calculation problem, risk management in insurance (cf. Asmussen and Albrecher (2010)), nonparametric estimation in statistics (see, e.g., Bahadur and Rao (1960), van der Vaart (1998), Joutard (2006, 2013)), and in network information theory (cf. Lee et al. (2016, 2017)).
Let be a sequence of independent and identically distributed (i.i.d.) random variables with mean [math] and variance . Let () be the partial sums. By the central limit theorem,
[TABLE]
where is the probability distribution of the standard normal random variable. If for a suitable sequence , we have
[TABLE]
or uniformly over , then Eq. (1) is called moderate deviation probability or normal deviation probability for since it can be estimated by the standard normal distribution. We refer to as a range for the moderate deviation. The most famous result of this kind is the Cramér type moderate deviation. Under Cramér’s condition, one has the following Cramér’s theorem (Cramér (1938), Petrov (1954; 1975, p.218; or 1995, p.178)): If and then
[TABLE]
Here is a power series with coefficients depending on the cumulants of the random variable . Eq. (2) provides more precise approximation than (1) which holds uniformly on the range for any . The moderate deviations under Cramér’s condition for independent non-identically distributed random variables were obtained by Feller (1943), Petrov (1954) and Statulevičius (1966). The Cramér type moderate deviation has also been established for the sum of independent random variables with -th moment, . To name a few, for example, see Rubin and Sethuraman (1965), Nagaev (1965, 1979), Michel (1976), Slastnikov (1978), Amosova (1979), and Frolov (2005). It should be pointed out that the ranges the moderate deviations in these references are smaller (e.g., ).
The Cramér type moderate deviations for dependent random variables have also been studied in the literature. Ghosh (1974), Heinrich (1990) studied the moderate deviation for -dependent random variables. Ghosh and Babu (1977), Babu and Singh (1978a) studied moderate deviation for mixing processes. Grama (1997), Grama and Haeusler (2000, 2006) and Fan, Grama and Liu (2013) investigated the large and moderate deviations for martingales. Babu and Singh (1978b) established moderate deviation results for linear processes with coefficients satisfying . Wu and Zhao (2008) studied moderate deviations for stationary processes under certain conditions in terms of the physical dependence measure. But it can be verified that the results from Wu and Zhao (2008) can only be applied to linear processes with short memory and their transformations. Recently Peligrad et al. (2013) studied the exact moderate and large deviations for short or long memory linear processes. Sang and Xiao (2018) studied exact moderate and large deviations for linear random fields and applied the moderate result to prove a Davis-Gut law of the iterated logarithm. Nevertheless, in the aforementioned works, the moderate deviations are studied for dependent random variables with -th moment, . The exact moderate deviation for random fields under Cramér’s condition has not been well studied. For example, the optimal range and the exact rate of convergence in (1) had been unknown in the random field setting.
The main objective of this paper is to establish exact moderate deviation analogous to (2) for random fields under Cramér’s condition. Our main result is Theorem 2.1 below, whose proof is based on the conjugate method to change the probability measure as in the classical case (see, e.g., Petrov (1965, 1975)). The extension of this method to the random field setting reveals the deep relationship between the tail probabilities and the properties of the cumulant generating functions of the random variables such as the analytic radius and the bounds, for within some ranges related to the sum of the variances and the analytic radius of the cumulant generating functions of these random variables. Compared with the results in Sang and Xiao (2018) for linear random fields, Theorems 2.1 and 3.1 in this paper provide more precise convergence rate in the moderate deviations and explicit information on the range , which is much bigger than the range in Theorem 2.1 in Sang and Xiao (2018). In Section 3 we show that Theorem 2.1 is applicable to linear random fields with short or long memory and to nonparametric regression analysis. The results there can be applied to approximate the quantiles and tail conditional expectations for the partial sums of linear random fields.
In this paper we use the following notations. For two sequences and of real numbers, means as ; means that as for some constant ; for positive sequences, the notation or means that is bounded. For denote . Section 2 gives the main results. In Section 3 we study the application of the main results in linear random fields and nonparametric regression. All the proofs go to Section 4.
Acknowledgement The authors are grateful to the referee and the Associate Editor for carefully reading the paper and for insightful suggestions that significantly improved the presentation of the paper. The research of Hailin Sang is supported by the Simons Foundation Grant 586789 and the College of Liberal Arts Faculty Grants for Research and Creative Achievement at the University of Mississippi. The research of Yimin Xiao is partially supported by NSF grants DMS-1612885 and DMS-1607089.
2 Main results
Let be a random field with zero means defined on a probability space . Suppose that for each , the random variables are independent and satisfy the following Cramér condition: There is a positive constant such that the cumulant generating function
[TABLE]
where is the disc of radius on the complex plane , and denotes the principal value of the logarithm so that . This setting is convenient for applications to linear random fields in Section 3.
Without loss of generality we assume in this section that . Within the disc , can be expanded in a convergent power series
[TABLE]
where is the cumulant of order of the random variable . We have that and . By Taylor’s expansion, one can verify that a sufficient condition for (3) is the following moment condition
[TABLE]
This condition has been used frequently in probability and statistics, see Petrov (1975, p.55), Johnstone (1999, p.64), Picard and Tribouley (2000, p.301), Zhang and Wong (2003, p.164), among others.
Denote
[TABLE]
[TABLE]
and assume that is well-defined and for each . The following is the main result of this paper.
Theorem 2.1
Suppose that, for all and , there exist non-negative constants such that
[TABLE]
and suppose that as , and
[TABLE]
If and , then
[TABLE]
[TABLE]
where
[TABLE]
is a power series that stays bounded uniformly in for sufficiently small values of and the coefficients only depend on the cumulants of .
For the rest of the paper, we only state the results for . Since stays bounded uniformly in for sufficiently small values of and from the proof of Theorem 2.1, we have the following corollary:
Corollary 2.1
Assume the conditions of Theorem 2.1 hold. Then for with x=O\Big{(}(H_{n}\sqrt{B_{n}})^{1/3}\Big{)} we have
[TABLE]
Notice that under the condition x=O\Big{(}(H_{n}\sqrt{B_{n}})^{1/3}\Big{)}. Also taking into the account the fact that for
[TABLE]
we obtain the following corollaries:
Corollary 2.2
Under the conditions of Theorem 2.1, we have that for with x=O\Big{(}(H_{n}\sqrt{B_{n}})^{1/3}\Big{)},
[TABLE]
Corollary 2.3
Assume the conditions of Theorem 2.1 and for all . Then for with x=O\Big{(}(H_{n}\sqrt{B_{n}})^{1/3}\Big{)}, we have
[TABLE]
Also as , as , we have
Corollary 2.4
Under the conditions of Theorem 2.1, if , , then
[TABLE]
for every positive constant .
3 Applications
In this section, we provide some applications of the main result in Section 2. First, we derive a moderate deviation result for linear random fields with short or long memory; then we apply this result to risk measures and apply a same argument to study nonparametric regression.
3.1 Cramér type moderate deviation for linear random fields
Let be a linear random field defined on a probability space by
[TABLE]
where the innovations , are i.i.d. random variables with mean zero and finite variances , and where is a sequence of real numbers that satisfy .
Linear random fields have been studied extensively in probability and statistics. We refer to Sang and Xiao (2018) for a brief review on studies in limit theorems, large and moderate deviations for linear random fields and to Koul et al. (2016), Lahiri and Robinson (2016) and the reference therein for recent developments in statistics.
By applying Theorem 2.1 in Section 2, we establish the following moderate deviation result for linear random fields with short or long memory, under Cramér’s condition on the innovations . Compared with the moderate deviation results in Sang and Xiao (2018), our Theorem 3.1 below gives more precise convergence rate which holds on much wider range for .
Suppose that there is a disc centered at within which the cumulant generating function of is analytic and can be expanded in a convergent power series
[TABLE]
where is the cumulant of order of the random variables . We have that and , .
We write
[TABLE]
where . In the setting of Section 2, we have , . Then it can be verified that for all and , satisfy condition (3) for suitably chosen . In the notation of Section 2, we have
[TABLE]
Hence, we can apply Theorem 2.1 to prove the following theorem.
Theorem 3.1
Assume that the linear random field has short memory, i.e.,
[TABLE]
or long memory with coefficients
[TABLE]
where is a constant, is a slowly varying function at infinity and is a continuous function defined on the unit sphere . Suppose that there exist positive constants and such that
[TABLE]
in the disc . Then for all with , we have
[TABLE]
where
[TABLE]
is a power series that stays bounded uniformly in for sufficiently small values of and the coefficients only depend on the cumulants of and on the coefficients of the linear random field.
To the best of our knowledge, Theorem 3.1 is the first result that gives the exact tail probability for partial sums of random fields with dependence structure under the Cramér condition.
Due to its preciseness, Theorem 3.1 can be applied to evaluate the performance of approximation of the distribution of linear random fields by truncation. We often use the random variable with finite terms to approximate the linear random field in practice. For example, the moving average with finite terms is applied to approximate the linear process (moving average with infinite terms). In this case, Theorem 3.1 also applies to the partial sum . Here only finite terms are non-zero. Denote
[TABLE]
Then for all with , we have
[TABLE]
where
[TABLE]
and where the coefficients have similar definition as . To see the difference between the two tail probabilities of the partial sums, we have
[TABLE]
here as in the proof of Theorem 3.1, we take , , , ,
[TABLE]
[TABLE]
If , is dominated by \exp\big{\{}\frac{x^{3}}{n^{d/2}}(\beta_{0n}-\beta_{0n}^{m})\big{\}}. If , then and can be dominated by \exp\big{\{}\frac{x^{4}}{n^{d}}(\beta_{1n}-\beta_{1n}^{m})\big{\}} which depends on whether . In general, Theorem 3.1 can be applied to evaluate whether the truncated version is a good approximation to in terms of the ratio for in different ranges which depends on the property of the innovation and the sequence .
Theorem 3.1 can be applied to calculate the tail probability of the partial sum of some well-known dependent models. For example, the autoregressive fractionally integrated moving average FARIMA processes in one dimensional case introduced by Granger and Joyeux (1980) and Hosking (1981), which is defined as
[TABLE]
Here are nonnegative integers, is the AR polynomial and is the MA polynomial. Under the conditions that and have no common zeros, the zeros of lie outside the closed unit disk and , the FARIMA() process has linear process form with . Here is the gamma function.
3.2 Approximation of risk measures
Theorem 3.1 can be applied to approximate the risk measures such as quantiles and tail conditional expectations for the partial sums in (8) of linear random field . Given the tail probability , let be the upper -th quantile of . Namely . By Theorem 3.1, for all with ,
[TABLE]
We approximate by , where can be solved numerically from the equation
[TABLE]
The tail conditional expectation is computed as
[TABLE]
which can be solved numerically. The quantile and tail conditional expectation, which are also called value at risk (VaR) or expected shortfall (ES) in finance and risk theory, are important measures to model the extremal behavior of random variables in practice. The precise moderate deviation results in this article provide a vehicle in the computation of these two measures of time series or spacial random fields. See Peligrad et al. (2014a) for a brief review of VaR and ES in the literature and a study of them when a linear process has -th moment () or has a regularly varying tail with exponent .
3.3 Nonparametric regression
Consider the following regression model
[TABLE]
where is a bounded continuous function on , ’s are the fixed design points over with values in a compact subset of , and is a linear random field over , where the i.i.d. innovations satisfy the same conditions as in Subsection 3.1. The kernel regression estimation for the function on the basis of sample pairs , has been studied by Sang and Xiao (2018) under the condition that the i.i.d. innovations satisfy for some and (or) the innovations have regularly varying right tail with index . See Sang and Xiao (2018) for more references in the literature for regression models with independent or weakly dependent random field errors.
We study the kernel regression estimation for the function on the basis of sample pairs , , when the i.i.d. innovations satisfy the conditions as in Subsection 3.1. Same as in Sang and Xiao (2018) and the other references in the literature, the estimator that we consider is given by
[TABLE]
where the weight functions ’s on have form
[TABLE]
Here is a kernel function and is a sequence of bandwidths which goes to zero as . Notice that the weight functions satisfy the condition .
For a fixed , let
[TABLE]
where . Let , . By the same analysis as in the proof of Theorem 3.1, we take and derive a moderate deviation result for . That is, if as , , then
[TABLE]
A similar bound can be derived for P\big{(}|S_{n}(z)|>x\sqrt{B_{n}(z)}\big{)}. Notice that these tail probability estimates are more precise than those obtained in Sang and Xiao (2018), where an upper bound for the law of the iterated logarithm of was derived. With the more precise bound on the tail probability in (14) and certain assumptions on and the fixed design points [cf. Gu and Tran (2009)], one can construct a confidence interval for .
More interestingly, our method in this paper provides a way for constructing confidence bands for the function when , where is a compact interval. Observe that for any , we can write
[TABLE]
Under certain regularity assumption on and the fixed design points [cf. Gu and Tran (2009)], we can apply the argument in Subsection 3.1 to derive exponential upper bound for the tail probability P\big{(}|S_{n}(z)-S_{n}(z^{\prime})|>x\sqrt{B_{n}(z,z^{\prime})}\big{)}, where B_{n}(z,z^{\prime})=\sigma^{2}\sum_{j\in\mathbb{Z}^{d}}\big{(}b_{n,j}(z)-b_{n,j}(z^{\prime})\big{)}^{2}. Such a sharp upper bound, combined with the chaining argument [cf. Talagrand (2014)] would allow us to derive an exponential upper bound for
[TABLE]
which can be applied to derive uniform convergence rate of for all and to construct confidence band for the function . It is non-trivial to carry out this project rigorously and the verification of the details is a little lengthy. Hence we will have to consider it elsewhere.
4 Proofs
**Proof of Theorem 2.1
**
Since , the cumulant generating function of can be written as
[TABLE]
Cauchy’s inequality for the derivatives of analytic functions together with the condition (4) yields that
[TABLE]
By following the conjugate method (cf. Petrov (1965, 1975)), we now introduce an auxiliary sequence of independent random variables , , with the distribution functions
[TABLE]
where and is a real number whose value will be specified later.
Denote
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Note that, in the above and below, we have suppressed for simplicity of notations.
We shall see in the later analysis that the quantities and are well-defined for every and with where is a positive constant which is independent of . Throughout the proof we will obtain some estimates holding for the values of satisfying where the positive constant may vary but is always independent of . We will then take to be the smallest one among those constants . The selection of the constants does not affect the proof since the we need in the later analysis has property .
Also, the change of the order of summation of double series presented in the proof is justified by the absolute convergence of those series in the specified regions.
**Step 1: Representation of in terms of the conjugate measure
**
First notice that by equation (2.11) on page 221 of Petrov (1975), for any , we have
[TABLE]
Note that the condition (5) implies that . From (15) it follows that for any with and for any we have
[TABLE]
Therefore, for any with and with ,
[TABLE]
Hence, is well-defined and converges to in distribution or equivalently in probability or almost surely as .
For the in , let and . By Markov’s inequality, we have
[TABLE]
Hence, by (17) we have that for ,
[TABLE]
Applying Theorem 2.20 from van der Vaart (1998), we have
[TABLE]
as . And taking into account that
[TABLE]
and
[TABLE]
as we obtain from (16) that
[TABLE]
**Step 2: Properties of the conjugate measure
**
From the calculation of (18) it follows that the cumulant generating function of the random variable exists when is sufficiently small and we have
[TABLE]
. Denoting by the cumulant of order of the random variable , we obtain
[TABLE]
Setting and we find that
[TABLE]
and
[TABLE]
Hence, for , (21) imples
[TABLE]
which means that is well-defined and, as a function of , is analytic in .
Also, without loss of generality, we assume that
[TABLE]
By the definition of and (21), we have
[TABLE]
It follows from (15) that
[TABLE]
for and a suitable positive constant which is independent of and . This together with (25) implies that for
[TABLE]
Taking into account the condition (24), we get that
[TABLE]
Moreover, (25) implies that for ,
[TABLE]
Also, by the definition of and (22), we have
[TABLE]
It follows from (15) that
[TABLE]
for and a suitable positive constant which is independent of and . This together with (28) implies that for , is well-defined and
[TABLE]
Condition (24) then implies that
[TABLE]
Furthermore, (28) and (15) imply that for ,
[TABLE]
**Step 3: Selection of
**
Let be the real solution of the equation
[TABLE]
and let
[TABLE]
Then
[TABLE]
By (23) we know that is analytic in a disc and
[TABLE]
in that disc. It follows from Bloch’s theorem (see, e.g., Privalov (1984), page 256) that (33) has a real solution which can be written as
[TABLE]
for
[TABLE]
Moreover, the absolute value of that sum in (34) is less than . Condition (5) implies that there exists a disc with center at and radius that does not depend on within which the series on the right side of (34) converges.
It can be checked from (33) and (34) that
[TABLE]
Cauchy’s inequality implies that for every ,
[TABLE]
Therefore, as , becomes the dominant term of the series in (34). Hence, for sufficiently large we have
[TABLE]
and taking into account (32) we get
[TABLE]
It follows from (17) and (23) that for ,
[TABLE]
For the solution of the equation (31) we also have
[TABLE]
where with .
Recall that the series converges in the disc centered at with radius that does not depend on , and the absolute value of this sum is less than . We see from (37) that the function is obtained by the substitution of in a series that converges on the interval . It follows from Cauchy’s inequality that
[TABLE]
which means that for , stays bounded uniformly in . In particular, by (35) and (37), we have .
From now on we will assume that is the unique real solution of the equation (31).
**Step 4: The case
**
Now we prove the theorem for the case using the method presented in Petrov and Robinson (2006). Throughout the proof, denotes a positive constant which may vary from line to line, but is independent of and . If is the characteristic function of we then have that for
[TABLE]
Then
[TABLE]
Thus, using (15) we get that for , with ,
[TABLE]
Then, for appropriate choice of we have that
[TABLE]
for . Now applying Theorem 5.1 from Petrov (1995) with and we get that
[TABLE]
Since , as , and \lambda_{n}\Big{(}\frac{x}{H_{n}\sqrt{B_{n}}}\Big{)} is bounded uniformly in , we have
[TABLE]
Together with condition (5), to have (6) in the case , it is sufficient to show
[TABLE]
which is given by (38), since for .
So we will limit the proof of the theorem to the case .
**Step 5: The case
**
Making a change of variables and applying (31), we can rewrite (19) as
[TABLE]
Denote and we show that for sufficiently large
[TABLE]
Let be the characteristic function of . We then have that
[TABLE]
Then by (20) for and we have that
[TABLE]
where . For and , with , we have that
[TABLE]
Thus,
[TABLE]
Then, for appropriate choice of we have that
[TABLE]
for . Now applying (29) and Theorem 5.1 from Petrov (1995) with and , we have (40).
By (40) we have
[TABLE]
where
Denote
[TABLE]
and
[TABLE]
where
[TABLE]
is the Mills ratio which is known to satisfy
[TABLE]
for all . Hence, by (36) and (29) we obtain
[TABLE]
Hence,
[TABLE]
For every we have that , where . As for , , then using (5), (36), (26), (27), (29) and (30) we get that
[TABLE]
Hence,
[TABLE]
which means that
[TABLE]
Finally, combining (4), (31), (32), (37), (41) and (42) we get
[TABLE]
By (43) and the fact that , we see that
[TABLE]
This proves (6). The proof of (7) follows a same pattern and is omitted.
Proof of Theorem 3.1
Since , we see that the cumulant generating function of the random variable is given by
[TABLE]
Cauchy’s inequality for the derivatives of analytic functions together with the condition (11) yields that
[TABLE]
Denote . Then by (44), for any with and for any with we have
[TABLE]
Hence,
[TABLE]
Then by Theorem 2.1, if as , we have
[TABLE]
for .
If the linear random field has long memory then we have that (see Surgailis (1982), Theorem 2) . As the function is bounded, then for we have
[TABLE]
where we have used the fact (see Bingham et al. (1987) or Seneta (1976)) that for a slowly varying function defined on and for any ,
[TABLE]
It follows from the definition of in (10) that (for sufficiently large ) is attained at some . Hence, . We take which yields
[TABLE]
Then the result follows from (45).
If the linear random field has short memory, i.e., we can take and . Moreover, we also have
[TABLE]
and
[TABLE]
which means that .
As for all we have that by the definition of , then
[TABLE]
On the other hand, for we have that for sufficiently large . Hence,
[TABLE]
Thus, and the result follows from (45).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Amosova, N. N., 1979. On probabilities of moderate deviations for sums of independent random variables. Teor. Veroyatn. Primen. 24 , 858–865.
- 2[2] Asmussen, S. and Albrecher, H., 2010. Ruin Probabilities. World Scientific, Hackensack, NJ.
- 3[3] Babu, G. J. and Singh, K., 1978 a. Probabilities of moderate deviations for some stationary strong-mixing processes. Sankhy a ¯ ¯ 𝑎 \bar{a} Ser. A 40 , 38–43.
- 4[4] Babu, G. J. and Singh, K., 1978 b. On probabilities of moderate deviations for dependent processes. Sankhy a ¯ ¯ 𝑎 \bar{a} Ser. A 40 , 28–37.
- 5[5] Bahadur, R. and Rao, R. R., 1960. On deviations of the sample mean. Ann. Math. Statist. 31 , 1015–1027.
- 6[6] Bingham, N. H., Goldie, C. M. and Teugels, J. L., 1987. Regular Variation . Cambridge University Press, Cambridge, UK.
- 7[7] Cramér, H., 1938. Sur un nouveau théorème-limite de la théorie des probabilités , Actual. Sci. et Ind., Paris, 736.
- 8[8] Fan, X., Grama, I. G. and Liu, Q., 2013. Cramér large deviation expansions for martingales under BernsteinÕs condition. Stoch. Process. Appl. 123 , 3919–3942.
