Model Order Selection Rules For Covariance Structure Classification
V. Carotenuto, and A. De Maio, and D. Orlando, and P. Stoica

TL;DR
This paper develops adaptive rules for classifying covariance matrix structures in radar, using model order selection criteria to improve detection accuracy amid uncertainties.
Contribution
It introduces a framework applying MOS techniques like AIC, TIC, and BIC for covariance structure classification in radar signal processing.
Findings
Effective model selection rules demonstrated
Comparison of MOS techniques discussed
Improved classification accuracy shown
Abstract
The adaptive classification of the interference covariance matrix structure for radar signal processing applications is addressed in this paper. This represents a key issue because many detection architectures are synthesized assuming a specific covariance structure which may not necessarily coincide with the actual one due to the joint action of the system and environment uncertainties. The considered classification problem is cast in terms of a multiple hypotheses test with some nested alternatives and the theory of Model Order Selection (MOS) is exploited to devise suitable decision rules. Several MOS techniques, such as the Akaike, Takeuchi, and Bayesian information criteria are adopted and the corresponding merits and drawbacks are discussed. At the analysis stage, illustrating examples for the probability of correct model selection are presented showing the effectiveness of the…
| Parameter | Case 1 () | Case 2 () |
|---|---|---|
| 13 | 13 | |
| 0.15 | 0.15 | |
| 0.85 | 0.85 | |
| 0.285 | 0.285 | |
| [dB] | 30 | 20 |
| - | 0.93 | |
| - | 0.05 | |
| [dB] | - | 30 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Model Order Selection Rules For Covariance Structure Classification
V. Carotenuto, , A. De Maio, , D. Orlando, , P. Stoica V. Carotenuto and A. De Maio are with the Dipartimento di Ingegneria Elettrica e delle Tecnologie dell’Informazione, Università degli Studi di Napoli “Federico II”, via Claudio 21, I-80125 Napoli, Italy. E-mail: [email protected], [email protected]. Orlando is with Università degli Studi “Niccolò Cusano”, via Don Carlo Gnocchi 3, 00166 Roma, Italy. E-mail: [email protected]. Stoica is with the Department of Information Technology, Uppsala University, P O Box 337, SE-751 05, Uppsala, Sweden. E-mail: [email protected].
Abstract
The adaptive classification of the interference covariance matrix structure for radar signal processing applications is addressed in this paper. This represents a key issue because many detection architectures are synthesized assuming a specific covariance structure which may not necessarily coincide with the actual one due to the joint action of the system and environment uncertainties. The considered classification problem is cast in terms of a multiple hypotheses test with some nested alternatives and the theory of Model Order Selection (MOS) is exploited to devise suitable decision rules. Several MOS techniques, such as the Akaike, Takeuchi, and Bayesian information criteria are adopted and the corresponding merits and drawbacks are discussed. At the analysis stage, illustrating examples for the probability of correct model selection are presented showing the effectiveness of the proposed rules.
I Notation
In the sequel, vectors and matrices are denoted by boldface lower-case and upper-case letters, respectively. The symbols , , , , , denote the determinant, trace, Kronecker product, complex conjugate, transpose, and conjugate transpose, respectively. As to numerical sets, is the set of real numbers, is the Euclidean space of -dimensional real matrices (or vectors if ), is the set of complex numbers, and is the Euclidean space of -dimensional complex matrices (or vectors if ). The symbols and indicate the real and imaginary parts of the complex number , respectively. {\mbox{\boldmathI}}_{N} stands for the identity matrix, while [math] is the null vector or matrix of proper dimensions. We denote by {\mbox{\boldmathJ}}\in{\mathds{R}}^{N\times N} a permutation matrix such that {\mbox{\boldmathJ}}(l,k)=1 if and only if . Given a matrix {\mbox{\boldmathA}}=[{\mbox{\boldmatha}}_{1},\ldots,{\mbox{\boldmatha}}_{M}]\in{\mathds{C}}^{N\times M}, \mbox{\bf vec}\,({\mbox{\boldmathA}})=[{\mbox{\boldmatha}}_{1}^{T},{\mbox{\boldmatha}}_{2}^{T},\ldots,{\mbox{\boldmatha}}_{M}^{T}]^{T}\in{\mathds{C}}^{NM\times 1}, while given a vector {\mbox{\boldmatha}}\in{\mathds{C}}^{N\times 1}, \mbox{\boldmath\bf diag}\,({\mbox{\boldmatha}})\in{\mathds{C}}^{N\times N} indicates the diagonal matrix whose th diagonal element is the th entry of .
The Euclidean norm of a vector is denoted by . We write {\mbox{\boldmathM}}\succ{\mbox{\boldmath0}} if is positive definite. Let f({\mbox{\boldmathx}})\in{\mathds{R}} be a scalar-valued function of vector argument, then \partial f({\mbox{\boldmathx}})/\partial{\mbox{\boldmathx}} denotes the gradient of with respect to arranged in a column vector, while \partial f({\mbox{\boldmathx}})/\partial{\mbox{\boldmathx}}^{T} is its transpose. Moreover, if \widehat{{\mbox{\boldmathx}}} belongs to the domain of , then the gradient of with respect to and evaluated at \widehat{{\mbox{\boldmathx}}} is denoted by \partial f(\widehat{{\mbox{\boldmathx}}})/\partial{\mbox{\boldmathx}}. For a finite set stands for its cardinality. denotes the set of all unitary matrices and . For two sets, and , denotes their Cartesian product. The -entry (or -entry) of a generic matrix (or vector ) is denoted by {\mbox{\boldmathA}}(k,l) (or {\mbox{\boldmatha}}(l)). Given two statistical hypotheses and , then means that is nested into . The acronym i.i.d. means independent and identically distributed while the symbol denotes statistical expectation. Finally, we write {\mbox{\boldmathx}}\sim\mbox{\mathcal{C}}\mbox{\mathcal{N}}_{N}({\mbox{\boldmathm}},{\mbox{\boldmathM}}) if is a complex circular -dimensional normal vector with mean and covariance matrix {\mbox{\boldmathM}}\succ{\mbox{\boldmath0}}, {\mbox{\boldmathx}}\sim\mbox{\mathcal{N}}_{N}({\mbox{\boldmathm}},{\mbox{\boldmathM}}) if is a -dimensional normal vector with mean and covariance matrix {\mbox{\boldmathM}}\succ{\mbox{\boldmath0}}, and \varphi\sim\mbox{\mathcal{U}}(0,2\pi) if is a random variable uniformly distributed in .
II Introduction, Motivation, and Problem Formulation
Consider a radar system equipped with (spatial and/or temporal) channels. The echoes from the cell under test (CUT) are downconverted to baseband, pre-processed, properly sampled, and organized to form a -dimensional vector, say, referred to as primary data or CUT sample. A set of secondary data, {\mbox{\boldmathz}}_{1},\ldots,{\mbox{\boldmathz}}_{K}, with , statistically independent of , is also acquired in order to make the system adaptive with respect to the unknown Interference Covariance Matrix (ICM), {\mbox{\boldmathM}}\succ{\mbox{\boldmath0}}. As is customary, these data are assumed to share the same ICM as and are obtained exploiting echoes from range cells in the proximity of the CUT within the reference window [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].
To accomplish the detection task which is typical of the search process, the radar signal processor solves a testing problem applying a decision rule computed from the collected data (decision statistic). From a mathematical viewpoint, target detection can be formulated in terms of a binary hypothesis test and tools provided by the Decision Theory can be exploited to solve it. Several design criteria have been adopted in this respect: the Generalized Likelihood Ratio Test (GLRT) [1, 12, 13, 14, 15], the Wald test [16, 17, 18, 19, 20, 18], the Rao test111Note that GLRT, Wald test, and Rao test, under mild conditions, are asymptotically equivalent [21]. [11, 19, 20, 7, 18, 22], and the Invariance Principle [23, 24, 25, 26, 27, 28].
Usually a given design technique is applied under specific assumptions on the ICM structure which are tantamount to incorporating some degree of a priori knowledge at the design stage. Specifically, certain structures of the covariance can be induced by the interference type, the geometry of the system array, and/or uniformity of the transmitted pulse train. In the most general case, {\mbox{\boldmathM}}\in{\mathds{C}}^{N\times N} is Hermitian, but it is well-known that:
- •
ground clutter, observed by a stationary monostatic radar, often exhibits a symmetric power spectral density centered around the zero-Doppler frequency implying that the resulting ICM is real, i.e., {\mbox{\boldmathM}}\in{\mathds{R}}^{N\times N} [29];
- •
from a theoretical point of view, symmetrically spaced linear arrays or pulse trains induce a persymmetric structure on [30]; the following two cases are possible
- –
{\mbox{\boldmathM}}\in{\mathds{C}}^{N\times N} is Hermitian and persymmetric (or centrohermitian) if and only if {\mbox{\boldmathM}}={\mbox{\boldmathJ}}{\mbox{\boldmathM}}^{*}{\mbox{\boldmathJ}};
- –
{\mbox{\boldmathM}}\in{\mathds{R}}^{N\times N} is symmetric and persymmetric (or centrosymmetric) if and only if {\mbox{\boldmathM}}={\mbox{\boldmathJ}}{\mbox{\boldmathM}}{\mbox{\boldmathJ}}.
For each of the mentioned scenarios, there exist examples of adaptive detectors in the literature [5, 4, 31]. The knowledge about the environment as well as the structure of the ICM can guide the system operator towards the most appropriate decision scheme. In this regard, the primary sources of available information are directly related to the system and/or to the operating scenario. However, there exist a plethora of causes that introduce uncertainty and make the nominal assumptions no longer valid. For instance, array calibration errors would produce residual imbalances among channels that can heavily degrade the ICM persymmetric structure. Another example concerns the level of symmetry of ground clutter power spectral density which can be altered by the possible presence of a dominating Doppler or some discretes with a given velocity. This motivates the need for a classifier capable of inferring the ICM structure over the range bins of the system reference window. Its output could then be fed to a selector choosing the most suitable detection scheme as shown in Figure 1.
A possible approach to handle the mentioned classification problem is based on its formulation in terms of a multiple hypothesis test and on the use of model order selection (MOS) rules, since each possible choice for represents a model with a given number of parameters [32, 33, 34, 35, 36, 37, 38, 39]. Following this idea, it is worth making explicit the relationship between parameters and model. To this end, note that the number of parameters introduced by the specific structure of can be stacked into a vector {\mbox{\boldmath\theta}}_{i}\in\mathbb{R}^{m_{i}\times 1}, where depends on the specific scenario. Since the entries of {\mbox{\boldmath\theta}}_{i} parameterize , this dependence is denoted using the notation {\mbox{\boldmathM}}({\mbox{\boldmath\theta}}_{i}). Finally, the considered models (or hypotheses) are representative of combinations among the possible assumptions on the clutter spectrum (symmetry around zero-Doppler or the lack of the mentioned symmetry) and the system configuration (persymmetry).
In summary, the problem at hand is tantamount to choosing among the following hypotheses:
[TABLE]
The number of unknown parameters under each hypothesis is given by:
[TABLE]
For the sake of clarity, the proofs of (2) for the cases centrohermitian and centrosymmetric are provided in Appendix A.
Hereafter, for brevity, we omit the dependence on {\mbox{\boldmath\theta}}_{i} letting {\mbox{\boldmathM}}_{i}={\mbox{\boldmathM}}({\mbox{\boldmath\theta}}_{i}) and {\mbox{\boldmathX}}_{i}={\mbox{\boldmathM}}^{-1}({\mbox{\boldmath\theta}}_{i}).
Before concluding this section, a few remarks are in order. First, notice that different models could have the same number of parameters but, as shown in the next sections, this is not a limitation since classification rules exploit specific estimates corresponding to the different structures reflecting the assumed hypothesis. Second, it is possible to identify nested hypotheses among those listed in (1), for instance , , , etc.
In the next section, several MOS classification algorithms for problem (1) are briefly described highlighting the respective design assumptions, which might not be always met in the considered radar application. The latter observation means that the behavior of these classification rules versus the parameters of interest deserves a careful investigation. Section IV provides closed-form expressions for the classification statistics discussed in Section III. Concretely, these statistics are computed according to two approaches. The first exploits the overall data matrix which also comprises the CUT, whereas the second neglects the CUT and uses secondary data only. The performances of the considered selectors are analyzed in Section V, where the figure of merit is the probability of correct classification as a function of the number of data used for estimation. Finally, concluding remarks and future research tracks are given in Section VI. Mathematical derivations are confined to the appendices.
III Model Order Selection Criteria
The aim of this section is twofold. The first part provides useful preliminary definitions, while the second part presents a brief review of the adopted selection criteria for problem (1). Subsequent developments assume that {\mbox{\boldmathz}}_{k}\sim\mbox{\mathcal{C}}\mbox{\mathcal{N}}_{N}({\mbox{\boldmath0}},{\mbox{\boldmathM}}), , and {\mbox{\boldmathz}}\sim\mbox{\mathcal{C}}\mbox{\mathcal{N}}_{N}(\alpha{\mbox{\boldmathv}},{\mbox{\boldmathM}}), where , , is an amplitude factor accounting for target response and propagation effects and {\mbox{\boldmathv}}\in{\mathds{C}}^{N\times 1} is the nominal steering vector. Finally, the vectors {\mbox{\boldmathz}}_{1},\ldots,{\mbox{\boldmathz}}_{K},{\mbox{\boldmathz}} are assumed to be statistically independent.
Now, denote by {\mbox{\boldmathZ}}=\left[{\mbox{\boldmathz}}_{1},\ldots,{\mbox{\boldmathz}}_{K}\right]\in\mathbb{C}^{N\times K} the entire secondary data matrix and let \mbox{\boldmathp}_{i} be the parameter vector under the hypothesis, . Observe that
- •
if the CUT is incorporated into the classification rules, then \mbox{\boldmathp}_{i}=[{\mbox{\boldmath\theta}}_{i}^{T}\ {\mbox{\boldmath\alpha}}^{T}]^{T}\in{\mathds{R}}^{n_{i}\times 1}, where {\mbox{\boldmath\alpha}}=[\alpha_{re}\ \alpha_{im}]^{T}\in{\mathds{R}}^{2\times 1}, ; in this case, we let Z_{c}=\{{\mbox{\boldmathz}},{\mbox{\boldmathZ}}\};
- •
if the the classification rules are devised from only, then \mbox{\boldmathp}_{i}={\mbox{\boldmath\theta}}_{i}\in{\mathds{R}}^{n_{i}}, where ; here we let Z_{c}=\{{\mbox{\boldmathZ}}\}.
Because the derivation of the MOS criteria requires the computation of the maximum likelihood estimates (MLE) of the unknown parameters as well as suitable estimates of the Fisher Information Matrix (FIM), let us provide the expressions of the probability density functions (pdfs) of , {\mbox{\boldmathz}}_{k}, , , and the joint pdf of and {\mbox{\boldmathZ}}=\left[{\mbox{\boldmathz}}_{1},\ldots,{\mbox{\boldmathz}}_{K}\right]\in\mathbb{C}^{N\times K} under the considered hypotheses, namely, :
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where {\mbox{\boldmathS}}_{\alpha}=({\mbox{\boldmathz}}-\alpha{\mbox{\boldmathv}})({\mbox{\boldmathz}}-\alpha{\mbox{\boldmathv}})^{\dagger} and {\mbox{\boldmathS}}={\mbox{\boldmathZ}}{\mbox{\boldmathZ}}^{\dagger}.
Finally, denote by s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}})=\log f\left({\mbox{\boldmathz}};\mbox{\boldmathp}_{i},H_{i}\right), s({\mbox{\boldmath\theta}}_{i},H_{i};{\mbox{\boldmathz}}_{k})=\log f\left({\mbox{\boldmathz}}_{k};{\mbox{\boldmath\theta}}_{i},H_{i}\right), , and let
[TABLE]
denote the log-likelihood functions222Observe that is a nuisance parameter with respect to problem (1)..
The remainder of this section is focused on MOS criteria. Several of such criteria have been developed for the selection of an estimated best approximating model from a set of candidates [40]; most of them rely on minimization of the Kullback-Leibler (KL) discrepancy. A well-known rule is the Akaike Information Criterion (AIC), which, with reference to problem (1), can be formulated as
[TABLE]
where is the estimated model, , and \widehat{\mbox{\boldmathp}}_{i} is the MLE of \mbox{\boldmathp}_{i}. The main drawback of this rule is its non-zero probability of overfitting [33] due to the penalty term being too small for high-order models, especially for nested hypotheses. To overcome this limitation, an empirical modification of AIC has been proposed in [41]. This rule, referred to as Generalized Information Criterion (GIC), corrects the penalty term of AIC via a factor with , namely
[TABLE]
Note that if we set GIC reduces to AIC.
The Takeuchi Information Criterion (TIC), whose main goal is to extend AIC to mismodeling scenarios, has the following form [40]:
[TABLE]
where \mbox{\mbox{\boldmath{\mathcal{I}}}}_{i}(\widehat{\mbox{\boldmathp}}_{i})\in{\mathds{R}}^{n_{i}\times n_{i}} is the negative Hessian of the log-likelihood function evaluated at \widehat{\mbox{\boldmathp}}_{i}, namely the observed FIM, whose expression is
[TABLE]
and \widehat{\mbox{\mbox{\boldmath{\mathcal{J}}}}}_{i}(\widehat{\mbox{\boldmathp}}_{i}) is the sample FIM, viz.
[TABLE]
when and are both considered or
[TABLE]
when only is considered. Note that, given the true model parameter vector \bar{\mbox{\boldmathp}} and the true hypothesis , \widehat{\mbox{\mbox{\boldmath{\mathcal{I}}}}}_{i}(\widehat{\mbox{\boldmathp}}_{i}) and \widehat{\mbox{\mbox{\boldmath{\mathcal{J}}}}}_{i}(\widehat{\mbox{\boldmathp}}_{i}) are estimators of
[TABLE]
and
[TABLE]
respectively. It is important to observe that, in general, {\mbox{\mbox{\boldmath{\mathcal{I}}}}}(\bar{\mbox{\boldmathp}}) will not equal {\mbox{\mbox{\boldmath{\mathcal{J}}}}}(\bar{\mbox{\boldmathp}}) when the model is misspecified. However, if the model is correctly specified, then by the Information Matrix Equivalence Theorem [42], the information matrix can be expressed in either Hessian form, {\mbox{\mbox{\boldmath{\mathcal{I}}}}}(\bar{\mbox{\boldmathp}}), or in the outer product form, {\mbox{\mbox{\boldmath{\mathcal{J}}}}}(\bar{\mbox{\boldmathp}}).
Both the AIC (along with its generalization) and TIC are derived under the assumption of large samples. To relax this requirement, the corrected AIC (AICc) has been devised:
[TABLE]
It is important to note that in the considered framework the AICc is essentially a heuristic rule since it has been originally proposed for linear regression models [43] and later extended to the case of nonlinear regression and autoregressive time series [44], which neither covers the scenarios considered herein.
Finally, other selection rules, such as the Bayesian Information Criterion (BIC), can be obtained according to a Bayesian framework. The BIC has been derived as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model [45]. In large-sample settings, BIC selects the model which is a posteriori most probable. It is also worth mentioning that, under some regularity conditions, BIC minimizes the KL discrepancy [40, 33]. An alternative formulation of BIC can be obtained relaxing the large-sample requirement and assuming a noninformative prior for both the parameter vector {\mbox{\boldmath\theta}}_{i} and the model . Under the above hypotheses, BIC can be expressed as [33, 46, 47]
[TABLE]
which, for large samples and the herein considered context, reduces to (see Subsection IV-C)
[TABLE]
We note, once again, that even though different models can share the same number of parameters, the considered selection criteria are still capable of discriminating between the different hypotheses since they use the specific MLEs together with the corresponding log-likelihood function under the current hypothesis.
Also note that the definition of large or small samples, which is important for some of the previous criteria, depends on the ratio between the number of parameters, , and number of data, or . Moreover, for the considered application, depends on . Thus, the behavior of these criteria might change according to the specific application and, for this reason, has to be investigated.
For the problem under consideration, the ratio between the number of parameters and the number of samples approaches zero as the number of homogeneous secondary data, , increases. However, this situation might not be realizable in practical scenarios with the consequence that the large samples assumption would be no longer valid. Finally, the presence of outliers, clutter-edges, and/or regions with highly varying reflectivity can make the assumption that the true model belongs to the family of candidates fail. Thus, given these uncertainty factors, it is worthwhile investigating the considered MOS rules to determine which one performs better than the others. This is the scope of the next sections.
IV Computation of MOS Decision Rules
This section contains the derivation of the explicit expressions of the aforementioned classification rules. Specifically, we follow two approaches: Approach A jointly exploits secondary and primary data; whereas Approach B relies on secondary data only. The the former processes an additional data vector (primary data) with respect to the latter, but the number of unknown parameters increases due to the presence of the target complex amplitude. Moreover, the estimate of the target response represents an additional computational load for the rules based on the full data, which requires the computation of the decision statistics for each look direction. In contrast to this, Approach B does not depend on the system steering vector and, hence, the classification schemes can be evaluated irrespective of the current steering direction. The above strategies are described in the next two subsections, whereas the last subsection provides the expression of BIC for large values of .
IV-A MOS Decision Rules Using the Entire Data Matrix
It follows from Section III that the ingredients needed to construct a MOS decision rule are the MLEs of the unknown parameters, the log-likelihood functions, and the matrices \widehat{\mbox{\mbox{\boldmath{\mathcal{I}}}}}_{i}(\mbox{\boldmathp}_{i}) and \widehat{\mbox{\mbox{\boldmath{\mathcal{J}}}}}_{i}(\mbox{\boldmathp}_{i}). Evidently the mathematical expressions for all the above quantities depend on which model () is assumed.
The log-likelihood functions can be easily obtained from (3), (4), and (6), namely
[TABLE]
[TABLE]
[TABLE]
where {\mbox{\boldmathS}}_{k}={\mbox{\boldmathz}}_{k}{\mbox{\boldmathz}}_{k}^{\dagger}.
The next step towards the derivation of the MOS statistics consists in evaluating the gradients of s(\mbox{\boldmathp},H_{i};{\mbox{\boldmathz}}) and s(\mbox{\boldmathp},H_{i};{\mbox{\boldmathz}}_{k}), , which are required to compute \widehat{\mbox{\mbox{\boldmath{\mathcal{J}}}}}_{i}(\mbox{\boldmathp}_{i}). More precisely, observe that
[TABLE]
and
[TABLE]
In Appendix B, it is shown that
[TABLE]
where {\mbox{\boldmathC}}_{i}\in{\mathds{C}}^{N^{2}\times m_{i}} is a transformation matrix that depends on the specific structure of {\mbox{\boldmathM}}_{i} and on how {\mbox{\boldmath\theta}}_{i} is defined (see also Appendix B),
[TABLE]
and
[TABLE]
Now, we move to the evaluation of the Hessian of s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}},{\mbox{\boldmathZ}}), which can be partitioned as follows
[TABLE]
where
[TABLE]
{\mbox{\boldmathH}}_{\alpha\alpha,i}=-2{\mbox{\boldmathv}}^{{\dagger}}{\mbox{\boldmathX}}_{i}{\mbox{\boldmathv}}{\mbox{\boldmathI}}_{2}, and if {\mbox{\boldmathM}}_{i} is Hermitian
[TABLE]
while if {\mbox{\boldmathM}}_{i} is symmetric
[TABLE]
The proofs of the above statements are provided in Appendix C.
The final step consists in replacing the unknown parameters, namely and {\mbox{\boldmath\theta}}_{i}, with suitable estimates. Forasmuch as the ML estimates of the unknown parameters are not always available in closed form (to our best knowledge), we replace them with consistent estimates as follows. For the ICM, we use the ML estimates obtained from secondary data only. As to , its estimate is obtained according to the ML rule assuming known ICM and, then, replacing the ICM with the corresponding consistent estimate. Thus, when the ICM is unstructured, namely under , the estimates of the and are [2]
[TABLE]
respectively. When is assumed, the ICM is unstructured and real. Thus, following the lead of [5], we use the following estimates
[TABLE]
and
[TABLE]
[TABLE]
The persymmetric structure of the ICM, which occurs under , yields the following estimates [4]
[TABLE]
[TABLE]
where {\mbox{\boldmathz}}_{e}=({\mbox{\boldmathz}}+{\mbox{\boldmathJ}}{\mbox{\boldmathz}}^{*})/2 and {\mbox{\boldmathz}}_{o}=({\mbox{\boldmathz}}-{\mbox{\boldmathJ}}{\mbox{\boldmathz}}^{*})/2.
Finally, the estimates under can be obtained exploiting the results in [31], namely
[TABLE]
[TABLE]
where {\mbox{\boldmathV}}=[\Re\{{\mbox{\boldmathv}}\}\ \Im\{{\mbox{\boldmathv}}\}], {\mbox{\boldmathZ}}_{e}=[\Re\{{\mbox{\boldmathz}}_{e}\}\ \Im\{{\mbox{\boldmathz}}_{e}\}], and {\mbox{\boldmathZ}}_{o}=[\Re\{{\mbox{\boldmathz}}_{o}\}\ \Im\{{\mbox{\boldmathz}}_{o}\}].
IV-B MOS Decision Rules Using Secondary Data Only
Here we derive the expressions for the terms needed to compute the MOS rules based on secondary data only. To this end, we rely on the previous results. More precisely, first recall that \mbox{\boldmathp}_{i}={\mbox{\boldmath\theta}}_{i}, , and the log-likelihood function is given by (see (20))
[TABLE]
Moreover, the observed FIM and the sample FIM become
[TABLE]
and
[TABLE]
respectively, where \widehat{{\mbox{\boldmath\theta}}}_{i} is the ML estimate of {\mbox{\boldmath\theta}}_{i} under . Note that, as opposed to Approach A, in this case closed form expressions for the ML estimates are available and they are precisely given by the expressions presented in the previous subsections (see (31)-(37)). Finally, to evaluate the gradient of s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}}_{k}), we can use (25) and for the Hessian of s({\mbox{\boldmath\theta}}_{i},H_{i};{\mbox{\boldmathZ}}), we use (28) after replacing {\mbox{\boldmathS}}+{\mbox{\boldmathS}}_{\alpha} with .
IV-C BIC for Large K
In this subsection, we specialize (17) in the limit of . To this end, we first consider Approach A and approximate the penalty term of BIC as
[TABLE]
where \mbox{\mathcal{O}}(1) represents a term that tends to a constant as . The limiting approximation in (42) was obtained using the following asymptotic equalities
[TABLE]
in the expression of {\mbox{\boldmathH}}_{\theta\theta,i}/(K+1), see (28), and observing that {\mbox{\boldmathH}}_{\alpha\alpha,i}, (29), and (30) do not depend on . As a consequence, the following equalities hold
[TABLE]
[TABLE]
where {\mbox{\boldmathC}}\succ{\mbox{\boldmath0}} does not depend on . Therefore, neglecting the \mbox{\mathcal{O}}(1) term, (17) becomes (18). Observe that the above criterion is also valid in the case where the CUT is not used (i.e., Approach B). As a matter of fact, the expression of asymptotic BIC for the latter case can be obtained considering {\mbox{\boldmathH}}_{\theta\theta,i} only and repeating the above arguments.
V Numerical Examples and Discussion
This section is devoted to the analysis of the classification schemes presented in the previous sections. The metric used to assess their performance is the Probability of Correct Classification () estimated under each hypothesis by means of standard Monte Carlo counting techniques over independent trials.
The interference is modeled as circular complex normal random vectors with the following covariance matrix
[TABLE]
where \sigma^{2}_{n}{\mbox{\boldmathI}} represents the thermal noise component with being its power, {\mbox{\boldmathR}}_{i} accounts for the clutter contributions and incorporates the clutter power, and {\mbox{\boldmathA}}_{i} is a matrix factor modeling possible array channel errors as, for instance, amplification and/or delay errors, calibration residuals, and mutual coupling [29]. The specific instances of {\mbox{\boldmathA}}_{i} and {\mbox{\boldmathR}}_{i} depend on which hypothesis is in force as shown below.
Different interference sources (with exponentially shaped covariance) are encompassed by {\mbox{\boldmathR}}_{i}, whose th entry has the following expression
[TABLE]
where, for the th interference source, CNR is the Clutter-to-Noise Ratio, is the one-lag correlation coefficient, and is the normalized Doppler frequency. Finally, is the number of interference sources. For each hypothesis, we choose {\mbox{\boldmathR}}_{i} and {\mbox{\boldmathA}}_{i} as follows
- •
under : {\mbox{\boldmathA}}_{1}={\mbox{\boldmathI}}+\sigma_{d}{\mbox{\boldmathW}}_{1}, , , where and {\mbox{\boldmathW}}_{1}(h,k)\sim\mbox{\mathcal{C}}\mbox{\mathcal{N}}_{1}(0,1) i.i.d.;
- •
under : {\mbox{\boldmathA}}_{2}={\mbox{\boldmathI}}+\sigma_{d}{\mbox{\boldmathW}}_{2}, , , where and {\mbox{\boldmathW}}_{2}(h,k)\sim\mbox{\mathcal{N}}_{1}(0,1) i.i.d.;
- •
under : {\mbox{\boldmathA}}_{3}={\mbox{\boldmathI}}, , ;
- •
under : {\mbox{\boldmathA}}_{4}={\mbox{\boldmathI}}, , .
As to the target signature, we choose with \varphi\sim\mbox{\mathcal{U}}(0,2\pi) and SNR dB is the Signal-to-Noise Ratio, whereas, the steering vector is chosen such that
[TABLE]
assuming odd and . Finally, two study cases are considered: Case 1 assumes , i.e., only one clutter source is considered; Case 2 considers , i.e., two clutter types with different powers are assumed. The latter case can arise in scenarios where the radar swath contains an edge separating two types of clutter sources (e.g., ground and sea clutter). The considered parameter settings are described in Table I.
Figures 2 and 3 refer to Case 1 and contain the curves for Approach A and B, respectively. Inspection of the first figure highlights that AICc and GIC with exhibit poor performance under for and under for . This behavior is presumably due to the fact that in the current context AICc, as already stated, is heuristic, while the performance of GIC depends on the value of . Moreover, under and , BIC requires secondary data to achieve reasonable classification performances. Recall that BIC uses an estimate of the FIM. The remaining classification schemes guarantee a above over the considered range of values for . The described trend remains the same in Figure 3 except for a performance degradation for some architectures (such as AIC and TIC) when is low. The behavior of the considered rules can also be studied analyzing the classification percentages for each hypothesis. To this end, in Figure 4, we plot the percentages of classification by means of histograms for Approach A and assuming . The inspection of the figure shows that under (or ), some MOS rules decide for (or ) and vice versa. In other words, the misclassification occurs between and or between and . Finally, note that including the CUT in the MOS classification rules (Approach A) leads to better performances than those obtained by means of Approach B.
In Figures 5 and 6, the curves for Case 2 are reported. The behavior of the classification rules is similar to that observed in the previous figures with the difference that BIC suffers performance degradation for low values of under only.
From the inspection of all the above figures, it turns out that there does not exist a specific choice which provides the highest under all the considered settings and parameters range. However, the analysis underlines that the classification performances of some rules, in particular the AICc and GIC with , are poor for low values of and this drawback could be a reason to discard these architectures when and for the considered parameters setting. In contrast to this, TIC and BIC classification schemes are capable of guaranteeing when in all the considered conditions. However, these rules become somewhat unstable when ; this behavior may be due to the fact that the observed and sample FIM are less reliable when takes on relatively small values. Finally, the Asymptotic BIC and GIC with provide the highest performance even for low values of . The similarity in performance of these rules is due to the penalty terms whose values are close to each other for the considered parameters (i.e., for ). However, the hyperparameter of GIC is a degree of freedom that has to be suitably set (in fact, the GIC with has the worst performance), and there does not exist a general tuning criterion which allows us to choose the best value for . On the other hand, the asymptotic BIC, which does not require any hyperparameter setting, stems as a reasonable operational choice at least for the considered scenarios.
VI Conclusions
This paper has considered the interference covariance structure classification which is of primary concern in some radar signal processing applications. Starting from a set of multivariate radar observations, the classification has been formulated as a multiple hypotheses test with some nested instances characterized by a different number of parameters. Several MOS rules, based on different theoretical criteria, have been devised to perform the covariance structure selection. Besides, the possibilities of using primary and secondary data or only secondary vectors to implement the classification rules have been considered. At the analysis stage their performance has been assessed in correspondence of two different operational scenarios highlighting the merits and the drawbacks connected with each approach. The classification curves, the complexity as well as the stability, has singled out the Asymptotic BIC based on secondary data only as the recommended selector for the considered scenarios.
Finally, two possible future research tracks deserve attention. First of all, we will study the effect of the proposed MOS techniques for ICM structure selection on the performance of target detection. Some preliminary results in this direction are encouraging: they show that using the proposed techniques leads to performances close to those of the oracle target detector that knows the actual structure of the ICM. Then, the analysis on real radar data is essential to finally establish the effectiveness of the proposed approach.
Appendix A Number of Parameters when is Centrohermitian or Centrosymmetric
Assume that {\mbox{\boldmathM}}\in{\mathds{R}}^{N\times N} is centrosymmetric with even and let ; then can be partitioned as follows [48]
[TABLE]
where {\mbox{\boldmathA}}\in{\mathds{R}}^{m\times m} is symmetric, {\mbox{\boldmathB}}\in{\mathds{R}}^{m\times m} is persymmetric, and is an -dimensional permutation matrix. It is clear that
- •
the number of parameters defining is ;
- •
the number of parameters defining is .
Thus, can be represented by means of
[TABLE]
parameters.
In the case where is still even and {\mbox{\boldmathM}}\in{\mathds{C}}^{N\times N} is centrohermitian, has the following representation
[TABLE]
where {\mbox{\boldmathA}}\in{\mathds{C}}^{m\times m} is Hermitian and {\mbox{\boldmathB}}\in{\mathds{C}}^{m\times m} persymmetric. It follows that
- •
the number of parameters defining is ;
- •
the number of parameters defining is .
The total number of parameters is
[TABLE]
In order to complete the proof, assume that is odd and let . Following the lead of [49], a centrosymmetric {\mbox{\boldmathM}}\in{\mathds{R}}^{N\times N} can be partitioned as
[TABLE]
where {\mbox{\boldmathA}}\in{\mathds{R}}^{m\times m} is symmetric, {\mbox{\boldmathB}}\in{\mathds{R}}^{m\times m} is persymmetric, , and {\mbox{\boldmathc}}\in{\mathds{R}}^{m\times 1}. It turns out that the total number of parameters is
[TABLE]
Finally, assume that {\mbox{\boldmathM}}\in{\mathds{C}}^{N\times N} is centrohermitian; then it can be partitioned as [49]
[TABLE]
where {\mbox{\boldmathA}}\in{\mathds{C}}^{m\times m} is Hermitian, {\mbox{\boldmathB}}\in{\mathds{C}}^{m\times m} is persymmetric, , and {\mbox{\boldmathc}}\in{\mathds{C}}^{m\times 1}. As a consequence, the number of parameters characterizing is
[TABLE]
Appendix B Gradient of the Log-Likelihood Functions
As a preliminary remark, observe that the ICM is always either Hermitian or symmetric. Let us first focus on s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}}) and evaluate the first derivative of this function with respect to the th component of \mbox{\boldmathp}_{i}. It follows that two cases are possible: \mbox{\boldmathp}_{i}(l) is a component of {\mbox{\boldmath\theta}}_{i} or \mbox{\boldmathp}_{i}(l) is a component of .
As for the first case, it is possible to show that
[TABLE]
where the last equality comes from equations and of [50]. The above equation can be further simplified observing that
[TABLE]
where {\mbox{\boldmathC}}_{i}\in{\mathds{C}}^{N^{2}\times m_{i}} is a transformation matrix that depends on the specific structure of {\mbox{\boldmathM}}_{i} and on how {\mbox{\boldmath\theta}}_{i} is defined. For instance, if {\mbox{\boldmathM}}_{i} is Hermitian unstructured with and
[TABLE]
then
[TABLE]
It follows that
[TABLE]
where {\mbox{\boldmathe}}_{l,i} is the th elementary vector of size . Moreover, let , , , and be generic matrices whose sizes are such that the product A$$B$$C$$D makes sense and yields a square matrix; then the following equality holds [51]
[TABLE]
Thus, the second term of (57) can be recast as
[TABLE]
Gathering the above results and accounting for {\mbox{\boldmathM}}_{i} being symmetric or Hermitian, (57) becomes
[TABLE]
where the following equality has been used
[TABLE]
Hence, exploiting (64), it is not difficult to obtain (24). Following the same line of reasoning and replacing {\mbox{\boldmathS}}_{\alpha} with {\mbox{\boldmathS}}_{k}, it is possible to prove (25).
As a final step, we evaluate the gradient of s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}}) with respect to . To this end, observe that
[TABLE]
Using the above equation, the gradient with respect to can be expressed as in (26).
Appendix C Hessian of the Log-Likelihood Function
In this appendix we derive the Hessian of the log-likelihood function s(\mbox{\boldmathp}_{i},H_{i};{\mbox{\boldmathz}},{\mbox{\boldmathZ}}). To this end, consider {\mbox{\boldmathH}}_{\theta\theta,i}, whose -entry can be written as
[TABLE]
where the last equality comes from the application of and in [50]. Now, let us focus on the last term of (67) and exploit of [50] to obtain
[TABLE]
The terms involving the second-order derivative of {\mbox{\boldmathM}}_{i} can be discarded because
[TABLE]
Thus, the -entry of {\mbox{\boldmathH}}_{\theta\theta,i} can be recast as
[TABLE]
where
[TABLE]
The above expression can be further simplified exploiting (62). More precisely, the first term becomes
[TABLE]
where
[TABLE]
[TABLE]
Using the same line of reasoning, it is possible to recast the last term as follows
[TABLE]
Summarizing, if {\mbox{\boldmathM}}_{i} is Hermitian {\mbox{\boldmathH}}_{\theta\theta,i} can be written as
[TABLE]
whereas if {\mbox{\boldmathM}}_{i} is symmetric we have that
[TABLE]
Next, consider {\mbox{\boldmathH}}_{\alpha\alpha,i} and observe that the gradient of (26) with respect to {\mbox{\boldmath\alpha}}^{T} is
[TABLE]
As a final step towards the evaluation of {\mbox{\boldmathH}}_{i}, we derive the expression for {\mbox{\boldmathH}}_{\alpha\theta,i}. More precisely, exploiting previous results we get
[TABLE]
where {\mbox{\boldmathA}}_{1}={\mbox{\boldmathX}}_{i}{\mbox{\boldmathv}}{\mbox{\boldmathv}}^{\dagger}{\mbox{\boldmathX}}_{i}\frac{\partial{\mbox{\boldmathX}}_{i}}{\partial{\mbox{\boldmath\theta}}_{i}(l)} and {\mbox{\boldmathA}}_{2}={\mbox{\boldmathX}}_{i}{\mbox{\boldmathv}}{\mbox{\boldmathz}}^{\dagger}{\mbox{\boldmathX}}_{i}\frac{\partial{\mbox{\boldmathX}}_{i}}{\partial{\mbox{\boldmath\theta}}_{i}(l)}. Now, assume that the ICM is Hermitian; then using (61), (62), and (65) the above equation can be recast as
[TABLE]
where \bar{{\mbox{\boldmath\varPhi}}}_{i}=({\mbox{\boldmathX}}_{i}^{*}\otimes{\mbox{\boldmathX}}_{i})\mbox{\bf vec}\,[{\mbox{\boldmathv}}{\mbox{\boldmathv}}^{\dagger}], and \tilde{{\mbox{\boldmath\varPhi}}}_{i}=({\mbox{\boldmathX}}_{i}^{*}\otimes{\mbox{\boldmathX}}_{i})\mbox{\bf vec}\,[{\mbox{\boldmathv}}{\mbox{\boldmathz}}^{\dagger}]. As a consequence,
[TABLE]
On the other hand, if the ICM is symmetric, then (76) becomes
[TABLE]
where \bar{{\mbox{\boldmath\Psi}}}_{i}=({\mbox{\boldmathX}}_{i}\otimes{\mbox{\boldmathX}}_{i})\mbox{\bf vec}\,[{\mbox{\boldmathv}}{\mbox{\boldmathv}}^{\dagger}], and \tilde{{\mbox{\boldmath\Psi}}}_{i}=({\mbox{\boldmathX}}_{i}\otimes{\mbox{\boldmathX}}_{i})\mbox{\bf vec}\,[{\mbox{\boldmathv}}{\mbox{\boldmathz}}^{\dagger}]. As a consequence,
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. J. Kelly, “An adaptive detection algorithm,” IEEE Transactions on Aerospace and Electronic Systems , no. 2, pp. 115–127, 1986.
- 2[2] F. C. Robey, D. R. Fuhrmann, E. J. Kelly, and R. Nitzberg, “A CFAR adaptive matched filter detector,” IEEE Transactions on Aerospace and Electronic Systems , vol. 28, no. 1, pp. 208–216, 1992.
- 3[3] F. Bandiera, D. Orlando, and G. Ricci, Advanced Radar Detection Schemes Under Mismatched Signal Models , M. . C. P. Synthesis Lectures on Signal Processing No. 8, Ed., San Rafael, US, 2009.
- 4[4] L. Cai and H. Wang, “A persymmetric multiband glr algorithm,” IEEE Transactions on Aerospace and Electronic Systems , vol. 28, no. 3, pp. 806–816, 1992.
- 5[5] A. De Maio, D. Orlando, C. Hao, and G. Foglia, “Adaptive detection of point-like targets in spectrally symmetric interference,” IEEE Transactions on Signal Processing , vol. 64, no. 12, pp. 3207–3220, 2016.
- 6[6] G. Pailloux, P. Forster, J. P. Ovarlez, and F. Pascal, “Persymmetric adaptive radar detectors,” IEEE Transactions on Aerospace and Electronic Systems , vol. 47, no. 4, pp. 2376–2390, 2011.
- 7[7] J. Liu, W. Liu, B. Chen, H. Liu, H. Li, and C. Hao, “Modified rao test for multichannel adaptive signal detection,” IEEE Transactions on Signal Processing , vol. 64, no. 3, pp. 714–725, 2016.
- 8[8] J. Liu, G. Cui, H. Li, and B. Himed, “On the performance of a persymmetric adaptive matched filter,” IEEE Transactions on Aerospace and Electronic Systems , vol. 51, no. 4, pp. 2605–2614, 2015.
