On Minimax Detection of Gaussian Stochastic Sequences with Imprecisely Known Means and Covariance Matrices
Marat V. Burnashev

TL;DR
This paper investigates the minimax detection of Gaussian sequences with uncertain means and covariances, identifying conditions under which composite hypotheses can be simplified without loss of detection performance.
Contribution
It characterizes the maximal set of means and covariance matrices allowing composite hypothesis testing to be replaced by simple hypothesis testing without affecting the detection exponent.
Findings
Complete description of the maximal set of parameters
Conditions for equivalence between composite and simple hypothesis testing
Analysis of detection exponent under uncertainty
Abstract
We consider the problem of detecting (testing) Gaussian stochastic sequences (signals) with imprecisely known means and covariance matrices. The alternative is independent identically distributed zero-mean Gaussian random variables with unit variances. For a given false alarm (1st-kind error) probability, the quality of minimax detection is given by the best miss probability (2nd-kind error probability) exponent over a growing observation horizon. We explore the maximal set of means and covariance matrices (composite hypothesis) such that its minimax testing can be replaced with testing a single particular pair consisting of a mean and a covariance matrix (simple hypothesis) without degrading the detection exponent. We completely describe this maximal set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Multi-Criteria Decision Making
Problems of Information Transmission,
vol. 58, no. 3, pp. 70–84, 2022.
M. V. Burnashev
On Minimax Detection of Gaussian Stochastic Sequences with Imprecisely Known Means and Covariance Matrices 111This work was supported by the Russian Foundation for Basic Research under Grant 19-01-00364.
Abstract
We consider the problem of detecting (testing) Gaussian stochastic sequences (signals) with imprecisely known means and covariance matrices. The alternative is independent identically distributed zero-mean Gaussian random variables with unit variances. For a given false alarm (1st-kind error) probability, the quality of minimax detection is given by the best miss probability (2nd-kind error probability) exponent over a growing observation horizon. We explore the maximal set of means and covariance matrices (composite hypothesis) such that its minimax testing can be replaced with testing a single particular pair consisting of a mean and a covariance matrix (simple hypothesis) without degrading the detection exponent. We completely describe this maximal set. Key words and phrases: Minimax testing of hypotheses, error exponent, type-I error probability, type-II error probability, Stein’s exponent.
1 Introduction and the Main Results
1.1 Problem Setting
One of traditional problems of testing simple hypotheses and , concerning
Gaussian signal vector in the Gaussian noise background (i.e., the problem of signal detection in the noise background), based on observations has the form
[TABLE]
where the sample represents “noise” and consists of independent identically distributed Gaussian random variables with zero means and variances , and – unit covariance matrix. Stochastic “signal” is the Gaussian random variable with known mean and known covariance matrix .
However, in practice, we usually do not know precisely the mean and the matrix , and then, in reality, the observation model (1) takes the form
[TABLE]
where – given set of possible means , and – given set of possible covariance matrices (probably, depending on ). We denote for convenience
[TABLE]
Further, for the model (2) we consider the problem of minimax testing [1, 2, 3] of the simple hypothesis against the composite alternative , based on observations . If for making decision in favor of a set is chosen, such that
[TABLE]
then the 1st-kind error probability (“false alarm”) and the 2nd-kind error
probability (“miss probability”) , are defined by formulas, respectively,
[TABLE]
and
[TABLE]
We are interested in the minimal possible 2nd-kind error probability (see (4) and (5)), provided a given 1st-kind error probability , :
[TABLE]
and in the corresponding optimal decision set from (3).
In the paper, we consider the case when the value is fixed (or vanishes slowly with ). That case sometimes is called Neyman-Pearson problem of minimax testing of hypotheses. In that case the 1st-kind and the 2nd-kind errors imply very different losses for the statistician, and he is mainly interested in minimization of the 2nd-kind error probability . The case is quite popular in various applications (see, e.g., [4] and bibliography therein).
For given mean , matrix and the value denote by the minimal possible 2nd-kind error probability (see (6)). The corresponding optimal decision set is described by Neyman – Pearson lemma [1, 2]. Clearly,
[TABLE]
For a fixed and given sets , denote also by the minimal possible 2nd-kind error probability (see (6)). Then similarly to (7) we have
[TABLE]
In many practical cases the value decreases exponentially in . Therefore, it is natural (in any case, simpler and more productive) to investigate the corresponding exponents and as (some results on the equality in (8) are contained in [5]).
In the paper, we investigate sets , for which in (8) the following asymptotic equality holds:
[TABLE]
Motivation for investigation minimax testing of hypotheses (detection of signals) is described in detail in [1, 2, 3, 4]. If for given sets of means and matrices the relation (9) holds, then we may replace (without asymptotic losses) the entire set by the particular pair . Recall that the optimal test for a particular pair is described by Neyman – Pearson lemma and it reduces to the simple likelihood ratio test (LR-test). Otherwise (without relation (9)), the optimal minimax test is much more complicated Bayes test with respect to the least favorable prior distribution on the set . Therefore, it is natural to investigate when it is possible to replace the given set by a particular pair . But from technical viewpoint it is more convenient to consider the equivalent problem: for a given pair to find the maximal set of pairs , which can be replaced by the pair . This problem is mainly considered in the paper.
Remark 1. Models (1) and (2) can be reduced to the equivalent models with a diagonal matrix . Indeed, since – a covariance matrix (i.e., symmetric and positive definite), there exists an orthogonal matrix and a diagonal matrix , such that (see [[6], §§ 4.7–4.9; [7], Theorem 4.1.5]). In addition, the diagonal matrix consists of the eigenvalues of the matrix . Note also that for any orthogonal matrix the vector has the same distribution as that of (for the simple hypothesis of (2)). Therefore, multiplying both sides of (2) by , we may reduce the model (2) to the equivalent case with a diagonal matrix .
Definition 1. For a fixed , and a given sequence of pairs define by the sequence of the largest sets of pairs, such that the equality (9) takes the form
[TABLE]
Clearly, .
In other words, for a given 1st-kind error probability the sequence is the largest set of pairs, which can be replaced (without asymptotic losses for ) by one pair . Below we describe (Theorem 1) the largest set , satisfying (10). It generalizes similar result from [8], where the case was considered. It also strengthens similar result from [4], where for the set some lower bounds were obtained.
It is convenient first to investigate similar to the maximal sets , which appear if LR-detector (see Definition 2) is used. It will be shown that , i.e., LR-detector is asymptotically optimal.
In models (1) and (2) denote by the distribution of the value , where . Similarly denote by , , the distribution of the value , where . Denote also by and , , corresponding densities of probability distributions. For ()-matrix denote . Note that, if , then
[TABLE]
For introduce also the logarithm of the likelihood ratio (see (11))
[TABLE]
Consider first LR-detectors. Introduce the corresponding decision sets in favor of the hypothesis (i.e., in favor of the matrix ), when simple hypotheses and are tested:
[TABLE]
where is such that, (see (12))
[TABLE]
Definition 1. For a fixed and a given sequence of pairs denote by the sequence of the largest sets of pairs , such that
[TABLE]
provided the decision sets are used.
Below in Theorem 2 the set for the model (2) is described.
We shall also need the following definition [9].
Definition 2. For probability measures and on a measurable space introduce the function (Kullback–Leibler distance (or divergence) for measures and )
[TABLE]
where the expectation is taken over the measure .
Using formulas (11) and (16) we have
[TABLE]
where – the eigenvalues (all positive) of the covariance matrix , and .
1.2 Assumptions
In the model (2) denote by the eigenvalues (all positive) of the covariance matrix . We assume that the following assumptions are satisfied:
I. For all covariance matrices there exists the limit (see (17))
[TABLE]
(note that , ).
II. For some we have
[TABLE]
1.3 Main results
We first make an important explanation.
Remark 2. There is the following technical problem when describing the maximal sets . The relation (9) has the asymptotic (as ) character. Therefore, the maximal sets can also be described only asymptotically (as ). For that purpose, it is mostly convenient to describe the simplest sequence of sets, which gives in the limit the maximal sets .
In this paper, for a -matrix we denote . By we denote the inner product of vectors . We write , if is positive definite.
Let – the set of all -covariance (i.e., symmetric and positive definite) matrices in . For any , and any define the function
[TABLE]
where
[TABLE]
For a sequence of pairs introduce the following sequence of sets of pairs :
[TABLE]
where the function is defined in (20).
The following Theorem is the main result of the paper. It describes the sets
and from (10) and (15), respectively.
Theorem 1. * If assumptions (18), (19) hold, then as *
[TABLE]
*where equalities are understood in the sense of Remark *2.
Remark 3. Clearly, . Moreover, the sets and are convex in . Indeed, it is known [[6], § 8.5,Theorem 4; [7], Theorem 7.6.7], that the function is strictly concave on the convex set of positive definite symmetric matrices in . Therefore, the set is convex, i.e. any matrices and satisfy condition
[TABLE]
In a sense, – the set , enlarged by a “thin slice” whose width has the order of . In other words, can be considered as a “core” of the set .
We present also the following simplifying consequence to Theorem 1. Without loss of generality, we may assume that the matrix is diagonal (see Remark 1) with the eigenvalues (all positive). We also limit ourselves in (23) only to diagonal matrices with positive eigenvalues . The matrix is diagonal with the eigenvalues :
[TABLE]
Then for , we have from (21)
[TABLE]
Introduce the convex set of diagonal, positive definite matrices :
[TABLE]
If , then the function from (20) takes the form
[TABLE]
where are defined in (24), and is defined in (25). It is supposed also, that , .
For a sequence of pairs , , introduce the following set of pairs , :
[TABLE]
where the function is defined in (26).
Then the following “inner bound” for holds.
Theorem 2. * If assumptions (18), (19) hold, then the set contains the set :*
[TABLE]
where the set is defined in (27).
The set is convex in (see Remark 3).
Further, in an auxiliary Theorem 3 is given. In Theorem 1 is proved, and in as examples some particular cases of the problem are considered.
2 Auxiliary Theorem
In models (1), (2) we first consider the testing of simple hypotheses: the pair versus a pair . Denote
[TABLE]
Next Theorem is the main auxiliary result of this paper. Its proof follows the proof of Theorem 3 in [8]. A more general result is contained in [10].
Theorem 3. * For the minimal possible , , the bounds are valid*
[TABLE]
and
[TABLE]
where is defined by the relation
[TABLE]
Note that both bounds (29) and (30) are pure analytical relations without any limiting operations. The lower bound (29) and the upper bound (30) are close to each other, if the value is much smaller than (which usually has the order of ).
Next result gives an upper bound of the order , , for the value from (31). Its proof (see Appendix) follows the proof of Lemma 1 in [8].
Lemma 1. *For from (30) the upper bound holds *(see (19))
[TABLE]
3 Proof of Theorem 1
Since , in order to prove Theorem 1 it is sufficient to get the “inner bound” for , and then to get a similar “outer bound” for .
3.1 “Inner bound” for
We first estimate from above the value . For that purpose in the model (2) we consider the testing of the simple hypothesis against the simple alternative , when is known. We use the optimal LR-test with the decision region in favor of (see (13), (14)), where is defined in (31). Let us consider another pair , and evaluate the 2nd-kind error probability , provided the decision region is used. Then
[TABLE]
where . Due to the assumption (19) and the estimate (32), we have
[TABLE]
Therefore, if
[TABLE]
[TABLE]
3.2 “Outer bound” for
Now, we get a similar lower bound for . Consider first the testing of the simple hypothesis against the simple alternative . We use the optimal LR-test with the decision region in favor of (see (13), (14)). Then, denoting and , we have for error probabilities
[TABLE]
Consider another pair . Let – a decision region in favor of , and and – corresponding error probabilities. Then, denoting , we need to have for the 2nd-kind error probability (see (37))
[TABLE]
For some , , consider also the probability density
[TABLE]
and the corresponding value for it:
[TABLE]
[TABLE]
Note that the probability density corresponds to the Bayes problem statement, when the alternative hypothesis with probability coincides with , and with probability – with . The value is the corresponding 2nd-kind error probability.
We lowerbound the value . First we have
[TABLE]
For the last term in the right-hand side of (42) we have
[TABLE]
Therefore we get
[TABLE]
Consider the value in the right-hand side of (43). Denoting
[TABLE]
[TABLE]
Therefore
[TABLE]
where
[TABLE]
Therefore, by (41), (45) and (46) we need to have
[TABLE]
Note, that since , then we have from (46)
[TABLE]
Therefore, in order to have (47) fulfilled, we need to have
[TABLE]
Since , the relation (48) is equivalent to the condition
[TABLE]
Note, that
[TABLE]
Then, in order to have (49) fulfilled, we need, at least,
[TABLE]
Setting , we get from (50) the necessary condition
[TABLE]
which gives the “outer bound” for (see (23)).
Note that the “inner bound” (35), (36) for coincides with (51). Therefore, in order to finish the proof of Theorem 1 it remains us to express analytically the condition (51) via the matrices and means . For that purpose we use the following result.
Lemma 2.
[TABLE]
where the function is defined in (20).
If the matrix is not positive definite, then
[TABLE]
Proof. Denoting
[TABLE]
we get by (11)
[TABLE]
Note that (see (54))
[TABLE]
where (see also (21))
[TABLE]
Therefore, we can continue (54) as follows:
[TABLE]
Consider the integral in the right-hand side of (55). If , then [6, § 6.9, Theorem 3]
[TABLE]
Otherwise
[TABLE]
Assume first , i.e., the matrix is positive definite. Then, by (55), (56) we get
[TABLE]
If the matrix is not positive definite, then by (57)
[TABLE]
and therefore the condition (51) can not be satisfied. From (58), (59) Lemma 2 follows.∎
We continue the proof of Theorem 1. Define as the maximal set, satisfying the condition
[TABLE]
That set coincides with the definition (22). Therefore, from (35), (51), (52) and (60) Theorem 1 follows.
4 Examples. Particular cases
4.1 Known mean and known covariance matrix
We first consider the simplest case of known mean and known matrix , and apply Theorem 3. It will allow us to estimate the rate of convergence in Theorem 1. Without loss of generality, we may assume in model (2) that the covariance matrix is diagonal with positive eigenvalues (see Remark 1). Then (see (17))
[TABLE]
[TABLE]
where is estimated in (32).
In order to estimate simpler than (32), we assume additionally that the following condition is satisfied:
III. There exists , such that
[TABLE]
Then by Chebyshev inequality we have
[TABLE]
In order to have the right-hand side of (64) not exceeding , it is sufficient to set
[TABLE]
and then (62) takes the form
[TABLE]
which estimates the rate of convergence in (62).
Note also that similarly to (74), (75) we can get
[TABLE]
Therefore the condition III is equivalent to the inequality (see (61) and (65))
[TABLE]
Remark 4. The assumption (63) is fulfilled, for example, in the natural “regular” case, when elements , are “continuations” of elements , .
4.2 Unknown mean and known covariance matrix
Consider the case of model (2), when we know the covariance matrix , but we do not know the mean . Without loss of generality we may assume the covariance matrix diagonal with positive eigenvalues (see Remark 1). Then the function from (20) takes the form
[TABLE]
where for and we have
[TABLE]
The corresponding maximal set in that case takes the form (see (22))
[TABLE]
where the function is defined in (66).
Note that, if (i.e., when hypotheses differ only by means ) formulas (66), (67) take especially simple form:
[TABLE]
Those results follow also from papers [11, 12] (where that problem was considered in Hilbert and Banach spaces).
4.3 Known mean and unknown covariance matrix
We limit ourselves to the case . Then the function from (20) for takes the form
[TABLE]
The corresponding maximal set in that case takes the form (see (22))
[TABLE]
Formulas (69), (70) coincide with the corresponding results in [8, Theorem 1].
Proof of Lemma 1
Let – a Gaussian random vector with the distribution , and – a symmetric -matrix with eigenvalues . Consider the quadratic form . There exists the orthogonal matrix , such that , where – the diagonal matrix with diagonal elements [6, § 4.7]. Since , the quadratic forms and have the same distributions. Therefore, by formula (12) we have
[TABLE]
where
[TABLE]
Introduce the value (see (31))
[TABLE]
Then by (71), (72) and (17) we have for from (73)
[TABLE]
where
[TABLE]
In order to estimate the value in (75), we use the following result [13, Ch. III.5.15]: let – independent random variables with , . Then for any
[TABLE]
Therefore, using for Chebychev inequality and (76), we get
[TABLE]
In order to estimate the value in (74), (75), note that
[TABLE]
and then
[TABLE]
Therefore, using the standard bound
[TABLE]
we get ()
[TABLE]
In order to satisfy the condition we set , such that . Then, by (77) and (78) it is sufficient to set , satisfying (32).
FUNDING
Supported in part by the Russian Foundation for Basic Research, project no. 19-01-00364.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Wald, A., Statistical Decision Functions , New York: Wiley, 1950. Translated under the title Statisticheskie reshayushchie funktsii , in Pozitsionnyeigry (Positional Games), Moscow: Nauka, 1967, pp. 300–522.
- 2[2] Lehmann, E.L., Testing Statistical Hypotheses , New York: Wiley, 1959. Translated under the title Proverka statisticheskikh gipotez , Moscow: Nauka, 1979.
- 3[3] Poor, H.V., An Introduction to Signal Detection and Estimation , New York: Springer-Verlag, 1994, 2nd ed.
- 4[4] Zhang, W. and Poor, H.V., On Minimax Robust Detection of Stationary Gaussian Signals in White Gaussian Noise, IEEE Trans. Inform. Theory , 2011, vol. 57, no. 6, pp. 3915–3924.
- 5[5] Burnashev, M.V., On Detection of Gaussian Stochastic Sequences, Probl. Peredachi Inf. , 2017, vol. 53, no. 4, pp. 49–68 [ Probl. Inf. Transm. (Engl. Transl.), 2017, vol. 53, no. 4, pp. 349–367].
- 6[6] Bellman, R., Introduction to Matrix Analysis , New York: Mc Graw-Hill, 1960. Translated under the title Vvedenie v teoriyu matrits , Moscow: Nauka, 1976.
- 7[7] Horn, R.A. and Johnson, C.R., Matrix Analysis , Cambridge: Cambridge Univ. Press, 1985. Translated under the title Matrichnyi analiz , Moscow: Mir, 1989.
- 8[8] Burnashev, M.V., On Minimax Detection of Gaussian Stochastic Sequences and Gaussian Stationary Signals, Probl. Peredachi Inf. , 2021, vol. 57, no. 3, pp. 55–72 [ Probl. Inf. Transm. (Engl. Transl.), 2021, vol. 57, no. 3, pp. 248–264].
