Robust mixture modelling using sub-Gaussian stable distribution
Mahdi Teimouri, Saeid Rezakhah, Adel Mohammdpour

TL;DR
This paper introduces an EM algorithm for mixture models based on sub-Gaussian stable distributions, demonstrating robustness and effectiveness in modeling heavy-tailed data across various datasets.
Contribution
It presents a novel EM algorithm for parameter estimation in mixtures of sub-Gaussian stable distributions, a computationally tractable subclass of stable distributions.
Findings
The proposed model shows robustness in heavy-tailed data scenarios.
It outperforms some existing mixture models in simulations and real data.
The approach is effective for synthetic, simulated, and real datasets.
Abstract
Heavy-tailed distributions are widely used in robust mixture modelling due to possessing thick tails. As a computationally tractable subclass of the stable distributions, sub-Gaussian -stable distribution received much interest in the literature. Here, we introduce a type of expectation maximization algorithm that estimates parameters of a mixture of sub-Gaussian stable distributions. A comparative study, in the presence of some well-known mixture models, is performed to show the robustness and performance of the mixture of sub-Gaussian -stable distributions for modelling, simulated, synthetic, and real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications · Statistical Methods and Bayesian Inference
Robust mixture modelling using
sub-Gaussian -stable distribution
Mahdi Teimouri
Saeid Rezakhah
Adel Mohammdpour
Abstract: Heavy-tailed distributions are widely used in robust mixture modelling due to possessing thick tails. As a computationally tractable subclass of the stable distributions, sub-Gaussian -stable distribution received much interest in the literature. Here, we introduce a type of expectation maximization algorithm that estimates parameters of a mixture of sub-Gaussian -stable distributions. A comparative study, in the presence of some well-known mixture models, is performed to show the robustness and performance of the mixture of sub-Gaussian -stable distributions for modelling, simulated, synthetic, and real data.
Keyword: Clustering; EM algorithm; Monte Carlo; Mixture models; Robustness; Stable distribution; sub-Gaussian -stable distribution
1 Introduction
Finite mixture models are in fact a convex combination of two or more probability density functions. As the most critical application, these models received much interest in the model-based clustering which focuses mainly on the mixture of Gaussian distributions. Despite the popularity of Gaussian-based clustering, this algorithm shows poor performance in the presence of outliers. So, robust mixture models are becoming increasingly popular to overcome this issue. Some of these models aim to tackle tail-weight, [1], [4], [23], and [34]; some deal with skewness, [2], [5], [13], and [33]. Stable distributions have received extensive use in vast majority of fields such as finance, and telecommunications, [18], [19], [20], [24], and [27]. Statistical modelling of datasets gathered from these fields using normal distribution is quite improper because of heavy tails. Except for three cases, probability density function (pdf) of the class of stable distributions has not closed form. As a computationally tractable subclass of the multivariate stable distribution, the sub-Gaussian -stable distribution can account for modelling processes with outliers. The sub-Gaussian -stable distributions have received much interest in finance and portfolio optimization, [18], [20], and [25]. So, several attempts have been made to estimate the parameters of sub-Gaussian -stable distribution. Among them we cite [3], [21], [26], and [28]. The sub-Gaussian -stable distribution allows for heavier tails than Student’s distribution; it can be used as a more flexible tool robust model-based clustering. On the other hand, other approaches that have been developed for estimating the parameters of sub-Gaussian -stable distribution have no possibility of being extended for the mixture of sub-Gaussian -stable distributions. This motivated us to develop a method to estimating the parameters of the mixture of sub-Gaussian -stable distributions. It should be noted that idea of the Bayesian approach for estimating the parameters of mixture of sub-Gaussian -stable distribution suggested in [30] and they did not follow it. Our investigations reveal that the proposed EM algorithm shows better performance, regarding execution time than the Bayesian paradigm. The structure of this note is as follows. In what follows, we give some preliminaries. The Proposed EM algorithm is described in Section 3. Section 4 is devoted to performance analysis of the proposed EM algorithm through simulation, real data, and synthetic data.
2 Preliminaries
Let be a sub-Gaussian -stable random vector. Then a random sample
[TABLE]
where is a zero-mean Gaussian random vector with a positive definite symmetric shape matrix , follows S\bigl{(}\alpha/2,\left(\cos(\pi\alpha/4)\right)^{2/\alpha},1,0\bigr{)}, is a location parameter, and . Here, denotes equality in distribution and and are statistically independent. The corresponding observed values of , , and are , , and , respectively, for , [31]. If denotes the pdf of with parameters , , and at point , then the pdf of a -component sub-Gaussian -stable mixture model, , has the form
[TABLE]
where , is the sample size, and s are non-negative mixing parameters that sum to one, i.e. . Hereafter, we use notation SGSM as a symbol for -component sub-Gaussian -stable mixture distribution in which . Identifiability of the SGSM distribution with pdf in (2.2) is valid from [9].
Missing or incomplete observations frequently occur in the statistical studies. The EM algorithm, introduced in [7], is a popular inferential tool for such a situation. The application of EM technique also includes the cases that we encounter the latent variables or models with random parameter provided that they are formulated as a missing value problem. Let be the complete data likelihood function in which and denote the vector of observed and latent observations, respectively. The EM algorithm works iteratively by maximizing the conditional expectation, , of the complete log-likelihood function given the available data and a current estimate of the parameter. Each iteration of EM algorithm involves two steps as the E-step (computing at -th iteration) and the M-step (maximizing with respect to to get ). The E- and M-steps are repeated until convergence occurs.
As the M-step of EM algorithm is analytically intractable, we imply this step with a sequence of conditional maximization, known as CM-step. This procedure is known as ECM algorithm, [17]. A faster extension of EM algorithm, i. e. the ECME algorithm introduced in [15]. It should be noted that all the EM, ECM, and ECME have the same E-step. The ECME algorithm works by maximizing the constrained via some CM-steps and maximizing the constrained marginal likelihood function and some constraints on the parameters, [2]. In cases where implementation of the EM algorithm is difficult, another extension of this algorithm, called stochastic EM (SEM) is useful, [6]. We imply SEM by simulating missing data from conditional density of given and with pdf ; for , and substituting its sample into the complete likelihood function. Then, we apply EM algorithm for the pseudo-complete sample . This process is repeated until convergence occurs for the distribution of the . Under some mild regularity conditions, constitutes a Markov chain that converges to a stationary distribution, [11]. The SEM is generally very robust to the starting values, and the number of iterations is determined via an exploratory data analysis approach such as, graphical display, [11]. Using a burn-in of iterations, the sequence is expected to be close to some stationary point. After a sufficiently large number of iterations, say , the SEM estimation of is given by
[TABLE]
where is burn-in size. Upon above statements, each iteration of the SEM algorithm consists of two steps as follows.
Stochastic imputation (S-) step: Substitute the simulated missing values in the pseudo-complete log-likelihood function at -th iteration. 2. 2.
Maximization (M-) step: Find a , say , which maximizes pseudo-complete log-likelihood function at -th iteration.
The S- and M-steps, in above, are repeated until convergence occurs.
3 EM algorithm for SGSM
We consider as the complete data corresponding to (2.2) where are observed data, , are component labels and are missing observations. That is, if the -th component, for , of is one, then the other components are zero and -th observation is coming from -th component. This occurs with probability . We have
[TABLE]
independently and
[TABLE]
for and . It should be noted that given , s are independent. So, the complete data density function can be represented as
[TABLE]
where
[TABLE]
where and . It follows, from relations (3.3) and (3.4), that the complete data log-likelihood has the following representation
[TABLE]
where is a constant independent of . Considering as a function of component label and missing variable , its conditional expectation becomes
[TABLE]
where
[TABLE]
in which is pdf of a sub-Gaussian -stable random vector defined in (2.1) and
[TABLE]
E-step: The E-step is complete by computing and . Details for computing these quantities are given in Appendix 1. For this, we use package , [29].
M-step: The M-step of the EM algorithm updates the weight and location parameters of -th component in -th iteration as follows.
[TABLE]
[TABLE]
The shape matrix can be updated analytically in M-step, but we prefer to update it along with the tail index in a CM-step (this reduces computational costs). At -th iteration, the updated quantities , , , and ; for and are evaluated from (3.7)-(3.10), respectively. Using these updates, we follow to update and as and ; for in the CM-step. It should be noted that the CM-step can be implemented ?using numerical optimization tools. But the use of the Remark 3.1, which suggests to use a stochastic EM (SEM) algorithm, leads to a mathematically and computationally tractable CM-step.
Remark 3.1
Suppose that is a positive stable random variable with tail index and is an exponential random variable with mean one. Then, the quotient follows a Weibull distribution, independent of , with shape parameter and scale parameter unity, [31].
- •
First step of CM: We consider groups . Let , where is defined by (3.7). If the -th component of is larger than the other components, then is assigned to the -th group ; for , . Now, whose size is consists of and . Using (2.1) and remark 3.1, it follows that
[TABLE]
where is an exponential random variable with mean one independent of , and is a -dimensional zero-mean normal random vector with shape matrix . It is easy to check that,
[TABLE]
where follows a Weibull distribution with shape parameter and scale parameter one. Therefore,
[TABLE]
for , . By considering as the missing observation, the complete data log-likelihood of -th group is
[TABLE]
where is a constant independent of and ; for .
- •
Second step of CM (first step of SEM): For group , simulate the vector from conditional distribution of given , , and ; for and , using the Monte Carlo method, as described in Appendix 2, at the -th cycle of stochastic step.
- •
Third step of CM (second step of SEM): Using the vector of pseudo sample , maximize the right-hand side of (• ‣ 3) with respect to and to find and as
[TABLE]
and is a solution of
[TABLE]
for .
Now replace and at the right-hand side of (3.7) and (3.8), respectively. This completes E-step. Then, complete M-step by updating weight and location parameters at (3.9) and (3.10). Follow three steps of the CM-steps. Repeating this loop for times, the estimated parameters of SGSM, are obtained as the following.
[TABLE]
where is the size of burn-in for stochastic EM involved in CM-step and is either , , , or ; in -th iteration of the EM algorithm for -th group. Our studies reveal that setting and in (3.12) provides satisfactory accuracy in estimations.
In order to implement the proposed EM algorithm, one can use the Bayesian information criterion (BIC) to estimate the number of clusters, . The BIC is defined as , where is the log-likelihood of observations under SGSM distribution, is the number of free parameters, and is the sample size, [14]. Determining , to implementing the proposed EM algorithm, we first use package cluster, [16], for pre-clustering and then package , [29], for initial estimates of the parameters within each cluster.
4 Simulation study and real data analysis
This section has two parts. Firstly, we compare the performance of the proposed EM algorithm to modelling data simulated from SGSM distribution. We also check the robustness of SGSM distribution when data generated from a mixture of (MT) distribution in the presence of a mixture of skew (MST), a mixture of normal (MN), a mixture of skew normal (MSN), and a mixture of generalized hyperbolic (MGH) distributions. Secondly, for synthetic data analysis, we choose four stocks among 30 stocks of Dow Jones data, [21]. Finally, for real data analysis, we focus on data set which involves of six variables made on 100 genuine and 100 counterfeit Swiss bank notes. This dataset is available by loading package , [32]. It should be noted that, during analyses, we use package to model data via MT, MST, MN, and MSN distributions, [22]. Also, package is applied for modelling data by MGH distribution and computing adjusted Rand index (ARI) as a measure of performance, [10].
Example 1: Performance of the SGSM distribution is investigated through a small simulation study. For performance, we simulate 200 times realizations from 3-component SGSM distribution under settings: , , , , , , , and . We choose these settings of parameters from [23]. In following, Figure 1 displays the ARI computed under each of six mixture models. AS it is expected, Figure 1(a), the SGSM model shows the best performance in the sense of median of ARI. In order to investigate the robustness of the SGSM model, we simulate 200 times realizations from 3-component MT distribution under above settings (the degrees of freedoms for first, second, and third components are , , and , respectively). Figure 1(b) shows the computed ARIs when data are generated from MT distribution. As it is seen, surprisingly, MGH and SGSM model shows better performance than MT based on the median of ARI.
Example 2: We choose stocks AXP, JPM, MCD, and SBC stocks from Dow Jones data. It can be checked that a symmetric -stable pdf captures well the empirical distribution of these stocks based on 1247 observations with almost the same tail indices. Also, the joint scatterplots of these stocks are roughly elliptical. This means that a SGS distribution is suitable for modelling these stocks, [21]. Define and . The scatterplots of and have a perfect overlay when is zero and are well-separated when is large (say ). In the following, Figure 2 displays the computed ARI for =0.12,0.1,0.07,0.06,0.045,0.03,0.025. As it is seen, SGSM distribution shows the best performance.
Example 3: Among variables, we choose the width of left edge (left) and bottom margin width (bottom) from data, [8]. Computed ARIs correspond to SGSM model, MT, MST, MN, MSN, and MGH are 0.721102, 0.704122, 0, 0.704122, 0, and 0.7041185, respectively. This report indicates that SGSM gives the best performance.
5 Conclusion remarks
The E-step of the EM algorithm for calculating maximum likelihood estimates of the sub-Gaussian -stable mixture (SGSM) distribution parameters is not tractable computationally. We propose here some methodology that makes it possible to evaluate the intractable E-step for the SGSM distribution. We assume that the number of components is known and starting values for the EM algorithm are estimated using statistical packages have provided for clustering. A simulation study reveals that the proposed EM algorithm is robust against to starting values, outliers, and deviations from model assumptions. This is proved when data are generated from a mixture of distributions. Also, the performance of the proposed EM algorithm is demonstrated using synthetic and real data. We hope practitioners find this model useful for practical purposes. As a possible future work, we would like to develop the methodology proposed here to operator SGSM distribution in which, components of each cluster have different tail weights.
Appendix 1
At -th iteration of the E-step, to compute , we need computing the pdf of a SGS, i.e. f\bigl{(}{\boldsymbol{y}}_{i};\alpha^{(t)}_{j},\Sigma^{(t)}_{j},\boldsymbol{\mu}^{(t)}_{j}\bigr{)}. Also, it can be checked that
[TABLE]
Both of and are computed using package .
Appendix 2
To simulate the pseudo-complete data from conditional distribution of given , , and ; for and , we use rejection sampling by the following steps. We have our idea from [30] as follows. We note that the density function
[TABLE]
as a part of conditional (posterior) density function
[TABLE]
is bounded by some constant independent of . More precisely, by differentiating density with respect to , it turns out that attains its maximum as
[TABLE]
at point . Hence, the rejection sampling approach is employed to generate from the posterior distribution by the following steps.
Simulate a sample, say , from a Weibull distribution with shape parameter and scale unity. 2. 2.
Define and generate a sample from a uniform distribution , say . 3. 3.
If , then accept as an observation pdf given , , and ; for and ; otherwise, go to step 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrews, J. L. and Mc Nicholas, P. D. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t 𝑡 t -distributions. Statistics and Computing , 22, 1021-1029.
- 2[2] Basso, R. M., Lachos, V. H. Cabral, C. R. B., and Ghosh, P. (2010). Robust mixture modeling based on scale mixtures of skew-normal distributions, Computational Statistics and Data Analysis , 54, 2926-2941.
- 3[3] Bodnar, T. and Gupta, A. K. (2011). Estimation of the precision matrix of a multivariate elliptically contoured stable distribution, Statistics , 45(2), 131-142.
- 4[4] Browne, R. P. and Mc Nicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. The Canadian Journal of Statistics , 43(2), 176-198.
- 5[5] Cabral, C. R. B., Lachos, V. H., and Prates, M. O. (2012). Multivariate Mixture Modeling Using Skew-Normal Independent Distributions. Computational Statistics and Data Analysis , 56, 126-142.
- 6[6] Celeux, G. and Diebolt, J. (1985). The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for mixture problem, Computational Statistics Quarterly , 2 (1), 73-82.
- 7[7] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B , 39, 1-38.
- 8[8] Fraley, C., Raftery, A. E., and Scrucca, L. (2016). mclust : Normal Mixture Modeling for Model-Based Clustering, ?Classification, and Density Estimation, R package version 5.2.
