A Weighted Model Confidence Set: Applications to Local and Mixture Model Confidence Sets
Amir T. Payandeh Najafabadi, Ghobad Barmalzan, and Shahla Aghaei

TL;DR
This paper introduces a weighted model confidence set designed for situations where models are misspecified but certain data regions are informative, with applications demonstrated in local and mixture models through simulations.
Contribution
It proposes a novel weighted confidence set approach that accounts for model misspecification and emphasizes informative data regions, extending traditional confidence set methods.
Findings
Effective in identifying true models under misspecification
Applicable to local and mixture models with promising results
Validated through two simulation studies
Abstract
This article provides a weighted model confidence set, whenever underling model has been misspecified and some part of support of random variable conveys some important information about underling true model. Application of such weighted model confidence set for local and mixture model confidence sets have been given. Two simulation studies have been conducted to show practical application of our findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
**A Weighted Model Confidence Set: Applications to Local and Mixture Model Confidence Sets
**Amir.T. Payandeh Najafabadi111Corresponding author. Email: [email protected], Ghobad Barmalzan, & Shahla Aghaei
Department of Mathematical Sciences, Shahid Beheshti University, G.C. Evin, 1983963113, Tehran, Iran
Abstract
This article provides a weighted model confidence set, whenever underling model has been misspecified and some part of support of random variable conveys some important information about underling true model. Application of such weighted model confidence set for local and mixture model confidence sets have been given. Two simulation studies have been conducted to show practical application of our findings.
Keywords: Inference Under Constraints; Kullback–Leibler Divergence; Model Confidence Set; Local Goodness of Fitness; Mixture Models.
2010 Mathematics Subject Classification: 62G07, 62F25, 62E17, 62F30, 62F03
1 Introduction
Approximating an unknown density function is an interesting problem which has a wide range of applications in statistical inference and data analysis. To find out an appropriate approximation, in first step, one has to collect a collection of family of distributions, say models, which can be appropriate (in some sense) competing for Then, in the second step, unknown parameters of such selected models have to be estimated using an appropriated (in some sense) estimation method. Such collection of appropriate competing models either belong to a family of distributions (say a class of nested models) or a collection families of distributions (say a class of non-nested models). In the most statistical approaches using visual (such as qq–plots) or nonparametric (e.g. Kolmogorov–Smirnov test) tools, a nested models has been selected. Then, using an appropriate estimation method (such maximum likelihood, Bayesian, etc) unknown density has been approximated. But for the non-nested approach, selecting an appropriate non-nested models is a difficult task.
A considerable body of literature has been devoted to inference under a class of non-nested models. For instance: using the generalized likelihood ratio () test, Cox (1961, 1962) developed a statistic test to compare two non-nested models. Atkinson (1970) considered a combined models as an appropriated competing for an unknown model. Then, he derived a statistical testing procedure to: (1) study departure from one model in the direction of another and (2) test the hypothesis that all fitted models are equivalent. Pesaran (1974) employed the Cox’s test (developed for comparing separate families of hypotheses) to study the choice between two non-nested linear single-equation econometric models. Pesaran & Deaton (1978) extended Pesaran (1974)’s findings to cover multivariate nonlinear models whenever full information maximum likelihood estimation is available. Davidson & MaKinnson (1981, 2002) considered several popular procedures which test an econometric model and they established those procedures are closely related, but not identical, to the non-nested hypothesis tests which was proposed by Pesaran & Deaton (1978). Pesaran (1987) emphasized the distinction between a “local null” and a “local alternative”. Then, under local alternatives, they derived the asymptotic distribution of the Cox’s test statistic. Fisher & McAleer (1981) derived two modified tests that are asymptotically equivalent to the Cox’s test. Dastoor (1983) studied different between Cox’s and Atkinson’s statistics. In 1989, using concept of the Kullback–Leibler divergence, Vuong extended Cox’s test. The Vuong’s test compares two competing density functions based upon the expectation of their statistic. Based upon minimization of expectation of the Akaike Information Criteria (AIC), Shimodaira (1998) constructed a confidence set. Lo, et al. (2001) showed that under Vuong’s assumptions the statistic based upon the Kullback–Leibler divergence and random sample for hypothesis test
-component normal mixture v.s. -component normal mixture
is asymptotically distributed as a weighted sum of independent chi-squared random variables with one degree of freedom. Hansen, et al. (2003), via a simulation study, showed that the model confidence set captures the superior models across a range of significance levels. Lu, et al. (2008) compared the Wald’s and Cox’s statistics based upon a simulation study. They showed that the small sample behavior of both statistics is closed to their asymptotic distributions. Hansen, et al. (2011) employed confidence set approach to study density of inflation in a given data set and the best Taylor rule regression for an empirical problem. Sayyareh (2012) constructed a tracking interval for the problem of selecting an appropriate estimation of unknown density among a class of non-nested families of distributions whenever data arrived from a Type II right censored phenomena. Li, et al. (2012) provided a Bayesian approach to the problem of nonparametric estimation of unknown density
Using the Kullback–Leibler () divergence along with Vuong’s test, this article constructs a set of appropriate weighted models, say weighted model confidence set, for unknown true density which is a subset of a class of non-nested models. A weighted confidence set is a set of models that is constructed such that it will contain the best model with a given level of confidence. All models which take to the weighted model confidence set are equivalent in terms of closeness to true density Applications of such weighted model confidence set has been given for local and mixture goodness of fitness. Practical implementation of the results has been confirmed through two simulation studies.
The rest of this article developed as follows. Section 2 collects some pertinent concepts of the Kullback–Leibler divergence, likelihood ratio tests and some mathematical background for the problem. Main results have been explored in Section 3. Two simulation studies have been conducted in Section 4.
2 Preliminaries
The Kullback–Leibler divergence, say is widely employed statistic to select an appropriate model among a class of competing models. Suppose stands for the log–likelihood ratio for density function and unknown true density , for an observation The expectation of this ratio, with respect to true density is divergence between and . In other words,
[TABLE]
where and present irrelevant and relevant parts of divergence, respectively. The expectation plays a vital role in this article. The divergence is a nonnegative value which cannot be consider as a distance. But from its definition, one may conclude that implies that for all To select an appropriate set of models among a class of non-nested competing models, it suffices to consider just relevant part of the divergence.
The Hellinger and distances between two density functions and have been defined by
[TABLE]
The Hellinger and are two symmetric and non-negative distances that satisfies the triangle inequality. Moreover, convergence in Kullback–Leibler divergence implies convergence of the Hellinger and distances, see Van Erven & Harremos (2014) for more details. On other hand, the Kullback–Leibler divergence has a probabilistic/statistical meaning while Hellinger and distances do not. Therefore, in model selection literature the Kullback–Leibler divergence is a well-known distance which measures information lost whenever is used to approximate see Lv & Liu (2014) for more details.
Suppose is a sequence of i.i.d. random variables with common and unknown density function . Moreover, suppose that two competing classes of models
[TABLE]
can be viewed as two appropriate approximations for where and respectively, represent two different parameter spaces with dimensions and . Two classes of models and are called two non-nested classes if and only if Note that, the model is nested in if and only if The following represents formal definition of non-nested models which employs in model selection literature. The model with respect to is called well-specified if and only if there exists such that Otherwise, model with respect to is misspecified.
The most popular criteria in model selection has been coined and developed by Fisher (1921, 1922). In his seminal work, he employed concept of the maximum likelihood estimation method to develop a selection method criteria, well-known as a maximum likelihood method. This article utilizes a more advanced version of the maximum likelihood method, well-known as a quasi maximum likelihood estimator, say QMLE, that will be used in the rest of this article. The following represents definition of the QMLE.
Definition 1**.**
Suppose is a random sample with unknown density Moreover, suppose that weighted density function is an appropriate (in some sense) approximation for An estimator
[TABLE]
which is maximized log–likelihood of is well-known as the QMLE for parameter Moreover, an estimator
[TABLE]
which minimizes the divergence is called pseudo estimator for
It is well-known that the usual MLE coincides with the QMLE whenever the family is well-specified with respect to the , see White (1982b) for more details. Huber (1976) showed that the QML estimator, , is a consistent estimator for pseudo estimator whenever is misspecified with respect to .
Since is free of therefore based upon random sample two approximations and for can be compared through their relevant parts of divergence.
Now, we collect some useful lemmas relating to the empirical distributions and likelihood ratio statistics which are useful in representing the main results of this article. The following from Barmalzan & Payandeh Najafabadi (2012) provides an ML estimator for unknown density function
Lemma 1**.**
(Barmalzan & Payandeh Najafabadi, 2012) Suppose is a sequence of i.i.d. random variables with common and unknown density function Moreover, suppose that for given constant random variable has been defined by for . Then, are i.i.d. random variables with common Bernoulli distribution with unknown parameter Therefore,
i)
The ML estimator for is
[TABLE]
ii)
The ML estimator for based upon sample is for
iii)
The ML estimator for based upon sample is
[TABLE]
iv)
An estimator is converged, in probability, to
Based upon a random sample from unknown true density the likelihood ratio statistic, , for against is denoted by and defined by subtracting log–likelihood of two family of weighted models (3), i.e.,
[TABLE]
where and
The following recalls some useful properties of proof may be found in White (1982a) and Vuong (1989).
Lemma 2**.**
Suppose be a random sample with common and unknown true density Then, the likelihood ratio statistic has the following properties:
(i)
Under some mild conditions (White, 1982a) converges, almost surly, to
[TABLE]
(ii)
If , then converges, in law, to normal distribution
where
[TABLE]
and its estimator is
[TABLE]
The following section provides a model confidence set for true density function based upon a collection of weighted non-nested models, say weighted model confidence set. Application of such weighted model confidence set for local and mixture confidence sets have been given in two subsections.
3 Weighted Model Confidence Set
In traditional model-fitting strategy whole of observations received equal weight. Inference based on naive use of whole of information may be erroneous, since information conveyed by some of the observations is less important (in some sense) than the information conveyed by others. On the other hand, in many applications, only some parts of the space of variables are of interest, so that it makes sense to focus our attention in those regions. Weighted distribution are ideally suited model for these phenomenons. The weight function can be determined (or selected) to either reflecting such facts or taking into account some related information.
Suppose is a nonnegative function with finite expectation. Then, weighted density function based on realization of random variable under density function has been defined by
[TABLE]
where Since increasing number of unknown parameters contradict with the parsimony’s principle (Posada & Buckley, 2004) and artificially impact on the Kullback–Leibler divergence (Barmalzan & Sayyareh, 2010). Hereafter now, we just consider a situation that either is a given constant or Therefore, we just have where and
Fisher (1934) developed concept of weighted distribution. Rao (1965) pointed out that: in many situations the recorded observations cannot be considered as a random sample from the original distribution due to non-observability of some events, damage caused to original observations, etc. The length biased distribution (weighted distribution with ) has been found various applications in biomedical areas such as early detection of a disease. breast cancer (Zelen & Feinleib, 1969), human families and wild-life population studies (Rao, 1965), cardiology study involving two phases (Cnaan, 1985).
Weighted model selection has not received much attention in the model selection literature. The most of existence researches have been done regarding to bayes factor (Larose & Dey, 1996, 1998), nested model selection (Cheung, 2005, Ingrassia, et al., 2014), weighted model selection criteria, such as weighted least-squares support vector machines (Cawley, 2006), some information criterion, such as BIC and ICL, for mixture models as a weighted model (Dang, et al., 2014).
Now, Suppose that there is a collection of competing weighted non-nested models which could be used to describe random sample obtained under common and unknown density
[TABLE]
where is parameter space with dimensions and is a nonnegative and given weight function. Moreover, suppose that denotes a collection of weighted non-nested family of models for i.e.,
In the traditional model selection, the Kullback–Leibler divergence, is equally penalized all support of random variable Weighted version of the Kullback–Leibler divergence can be defined as
[TABLE]
Now using the relevant expectation one may conclude that: The class of models can be considered as an appropriate approximation for unknown density if and only if the null hypothesis in the following hypothesis test
[TABLE]
has been accepted at significance level where
The null hypothesis , for in hypothesis test (7), at significance level will be rejected in favor of if and only if
[TABLE]
where
[TABLE]
denotes dimension and is the quantile of standard normal distribution, and stands for the log–likelihood based upon random sample i.e., For more details about this hypothesis test in the non-nested models and its applications, interested readers may refer to Barmalzan & Payandeh Najafabadi (2012), among others.
To simplify the idea which behind of our weighted confidence set, consider a situation that we have only two competing weighted non-nested family of models and One may setup following two hypothesis tests
[TABLE]
Suppose that denotes a weighted model confidence set for Using the above two hypothesis tests, one may include in whenever has been accepted at significance level Similarly, one may include or both in whenever and , has been accepted at significance level respectively. Therefore, the weighted model confidence set is one of these three sets or
The following theorem formalizes the above idea. Its proof is similar to Barmalzan & Payandeh Najafabadi (2012, Theorem 1).
Theorem 1**.**
Suppose is a random sample with common and unknown density Moreover, suppose that is a collection of weighted non-nested family of models for Then, the weighted model confidence set for is given by
[TABLE]
where
Note that the above weighted model confidence set is not empty because it at least contains the maximum model with an error smaller than significance level
Remark 1**.**
Suppose is a random sample with common unknown true density Moreover, suppose is a collection of weighted non-nested models
[TABLE]
Then, the weighted model confidence set for unknown true density is given by
[TABLE]
where
[TABLE]
The weight function selects based upon nature of problem in the hand or local goodness of fitness. The above idea may also develop to mixture model selection.
3.1 Application to local Model Confidence Set
In practical situations, sometimes different parts of the data (or support of random variable ) may be weighted differently. This local consideration is perfectly reasonable, because it provides lower variance for observations and consequently more information, see Hand & Vinciotti (2003) for more details. On the other hand, in many practical problems all parts of the distributions are not conveyed of equal information about under study phenomenon. In the situation where the true model is properly specified, this fact is not a big issue since under these circumstances a good fit in some region will not lead a poor fit in another region. However, the problem arrives whenever model is misspecified that a good fit in some region may well reduce from quality of fit in another region. Improving the fit of a misspecified model in some specified part of the space by forcing a close fit between the model and the underlying distributions in that region may be achieved through differential weighting. Note, however, that the choice of weights ignores the fact that different points may be of differing degree of importance in the context of the problem.
Using result of Theorem 1, the following provides a local model confidence set.
Remark 2**.**
Suppose is a random sample with common and unknown true density Moreover, suppose that is a subset of support where conveys some important information. Then, the model confidence set in subset say local model confidence set, is given by:
[TABLE]
where is a collections of weighted non-nested models, and stands for the indicator function.
It is worthwhile mentioning that the above set provides a confidence set for true density just on subset not for whole of support
3.2 Application to Mixture Model Confidence Set
Using weighted model confidence set, one may go beyond of regular and classical model confidence set and consider mixture model confidence set. Such mixture confidence set can be obtained by the following three steps:
Step 1:
Partition whole of support of random variable into partitions;
Step 2:
Construct a where local confidence set for each partition;
Step 3:
Estimate an optimal mixture weight using a different criteria.
Step 3 provides an optimal mixture confidence set from another optimal criteria.
The following theorem illustrates the above three steps whenever support of random variable partitions into two disjoint sets and i.e., and
Theorem 2**.**
Suppose is a random sample with common and unknown true density and and are two collections of and non-nested models, respectively, which locally appropriated for Moreover, suppose that
[TABLE]
are two local model confidence set for unknown density where Then, an optimal global model confidence set for unknown density which minimized distance between convex combination of elements of and and is given by
[TABLE]
where and are two given weight functions in two partitions and (where and ),
[TABLE]
and
Proof. Suppose is a member of Therefore,
[TABLE]
On the other hand, is a convex combination of elements of and Now observe that the distance between and minimized if and only if
[TABLE]
has been maximized in The desired proof arrived from the fact that and is an estimator for
4 Simulation Study
This section through two simulation studies shows that how one may employ the above findings in practical applications.
Example 1**.**
Suppose random sample (for ) have been generated from a length biased (i.e., ) Lognormal model with parameters . For this simulation, we consider the following three length biased non-nested models.
[TABLE]
as a possible competing models to determine the underling density
To obtain a confidence set for the true density function Lognormal using the Monte-Carlo method with the R software, we generate, 1000 times, three sample size with length from underling Lognormal model. Using such generated data along with the above three non-nested models, to build up a confidence set for true model one has to conduct the following hypothesis test at significance level
[TABLE]
Table 1 shows decision of the above hypothesis tests at significance level for sample size
Table 1: Results of the above hypothesis tests at level
Hypothesis Test
v.s. v.s. v.s.
Sample Size Test Statistic Conclusion Test Statistic Conclusion Test Statistic Conclusion
0.98 is accepted - 1.12 is accepted - 1.48 accepted
1.87 is accepted - 1.88 is accepted - 2.37 is rejected
2.24 is accepted - 2.24 is rejected - 2.80 is rejected
Using results of Table 1, the following confidence set for the true density function Lognormal has been given in Table 2.
Table 2: A confidence set for different sample size
Sample size A confidence set
The confidence set, given by Table 2, shows that the method works properly and the true model (Lognormal) falls in confidence set. For the interpretation of equivalence of the above confidence sets, see Sayyareh, et al. (2011).
The following example explores a situation where one single confidence set cannot consider as an appropriate model confidence set for true density function.
Example 2**.**
Suppose random sample have been generated from a following density function.
[TABLE]
A histogram and true density function have been illustrated in part (a) of Figure 1.
From illustrated histogram, one may readily conclude that the underling distribution is a continuous distribution with two different modes. Thus, we cannot apply only one single density on the support . In this case, in the first step, we divide whole of support into two partitions and . Based upon graphical investigation, for the first part, we consider the non-nested competing models: {normal, cauchy, Logistic and Laplace} models and for the second part we propose three non-nested competing models: {gamma, Weibull and lognormal} models. Now in the following three steps, we develop a mixture confidence set for such generated observation.
Step 2-1: Local Confidence Set for A.
In this partition, we consider the following four non-nested models.
[TABLE]
where To build up a confidence set for first part, one has to conduct the following hypothesis test at significance level
[TABLE]
Table 3 shows decision of the above hypothesis tests at significance level after 1000 irritations.
Table 3: Hypothesis tests at significant level for
Hypothesis Test Test Statistics Conclusion
v.s. is rejected
v.s. is rejected
v.s. is accepted
v.s. is accepted
Therefore, the desired a local confidence set for the first part is
[TABLE]
Step 2-2: Local Confidence Set for .
For the second part, we consider the following three non-nested competing models
[TABLE]
where Using the above three competing models along with the following three hypothesis tests,
[TABLE]
Now, one may build up a model confidence set for the second part. Table 4 shows decision of the above hypothesis tests at significance level after 1000 irritations.
Table 4: Hypothesis tests at significant level for
Hypothesis Test Test Statistics Conclusion
v.s. is accepted
v.s. is accepted
v.s. is rejected
Therefore, the desired local confidence set for the second part is
[TABLE]
Step 3: Mixture confidence set.
Now to construct a mixture model confidence set for unknown density function we minimized distance between convex combination of elements of and and Such convex combination is given by
[TABLE]
Using result of Theorem (2), we estimate optimal mixture weight as , , and . Table 5 represents the Hellinger and the distances for the mixture model confidence set.
Table 5: The values of Hellinger and distances for the mixture model confidence set
Combining Models Hellinger Distance Distance
As Table 5 shows that all elements of the mixture model confidence set are appropriate choice for underling density function given by Equation (10).
Part (b) of Figure 1 illustrates element of the above mixture model confidence set and true density function. From this figure and Table 5, one may conclude that the above mixture model confidence set provides an appropriate approximation for true density function (10).
5 Conclusion and Suggestions
This article considers the problem of constructing an appropriate model confidence set, whenever underling model has been misspecified and some part of support of random variable conveys some important information above misspecified density function Using weighted density functions, this article constructs a weighted model confidence set for true density function Applications for such weighted model confidence set for local and mixture model confidence sets have been given. Through a simulation study, we have been seen that the weighted model confidence set offers a convenient model confidence set for complex data. Our findings cannot practically employ whenever non-nested competing model contain a large number models. Therefore, we suggest to develop an one single procedure to built up such confidence model set.
Acknowledgements
Thanks to an anonymous reviewer for his/her constructive comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Atkinson, A. C. (1970). A method for discriminating between models. Journal of the Royal Statistical Society , 32 , 323–344.
- 2[2] Barmalzan, G. & Payandeh Najafabadi (2012). Model Confidence Set Based on Kullback-Leibler Divergence Distance. J. Statist. Res. Iran , 9 , 115–129.
- 3[3] Barmalzan, G. & Sayyareh, A. (2010). The Choice of an Admissible Set of k Non-nested Models. Journal of Statistical Sciences , 4 (2), 149–165.
- 4[4] Cawley, G. C. (2006). Leave-one-out cross-validation based model selection criteria for weighted LS-SV Ms. IEEE, International Joint Conference on In Neural Networks . 1661–1668.
- 5[5] Cheung, Y. M. (2005). Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Transactions on Knowledge and Data Engineering , 17 (6), 750–761.
- 6[6] Cnaan, A. (1985). Survival models with two phases and length biased sampling. Communications in Statistics-Theory and Methods , 14 (4), 861–886.
- 7[7] Cox, D. R. (1961). Tests of separate families of hyphotesis. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , 1 , 105–123.
- 8[8] Cox, D. R. (1962). Further results on tests of separate families of hyphotesis. Journal of the Royal Statistical Society , 24 , 406–424.
