Fast Fusion Clustering via Double Random Projection
Hongni Wang, Na Li, Yanqiu Zhou, Jingxin Yan, Bei Jiang, Linglong Kong, Xiaodong Yan

TL;DR
This paper introduces a faster and more accurate clustering method using random projections to improve computational efficiency and results.
Contribution
The novel double random projection ADMM algorithm improves fusion clustering speed and accuracy in high-dimensional data.
Findings
The new algorithm significantly increases computational speed by reducing complexity.
Multiple random projections improve clustering accuracy under a new evaluation criterion.
The algorithm's convergence is proven and validated on simulated and real data.
Abstract
In unsupervised learning, clustering is a common starting point for data processing. The convex or concave fusion clustering method is a novel approach that is more stable and accurate than traditional methods such as k-means and hierarchical clustering. However, the optimization algorithm used with this method can be slowed down significantly by the complexity of the fusion penalty, which increases the computational burden. This paper introduces a random projection ADMM algorithm based on the Bernoulli distribution and develops a double random projection ADMM method for high-dimensional fusion clustering. These new approaches significantly outperform the classical ADMM algorithm due to their ability to significantly increase computational speed by reducing complexity and improving clustering accuracy by using multiple random projections under a new evaluation criterion. We also…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —National Key R&D Program of China
- —the National Natural Science Foundation of China
- —the National Statistical Science Research Project
- —Jinan Science and Technology Bureau
- —the China Academy of Engineering Science and Technology Development Strategy Shandong Research Institute Consulting Research Project
- —the State Scholarship Fund from China Scholarship Council
- —the Alberta Machine Intelligence Institute (AMII)
- —Natural Sciences and Engineering Council of Canada (NSERC)
- —Canada Research Chair program from NSERC
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications · Advanced Clustering Algorithms Research
1. Introduction
Clustering is a pivotal technique in unsupervised learning, applied extensively across various scientific and technological fields that handle large datasets. Clustering also plays a crucial role in data labelling, which sets the stage for the application of artificial intelligence and machine learning models [1,2] on the organized data to perform predictive analytics and classification tasks. Traditional clustering algorithms like k-means, Gaussian mixture models, and hierarchical clustering often face stability challenges due to their concave optimization formulations, which can lead to variability in results due to factors such as initial conditions or data outliers [3,4,5]. Recent advancements in convex or concave fusion methods have shown promise in enhancing stability, achieving more consistent global or local optimality and reliable estimation of cluster centers and counts through sparse-inducing penalties on pairwise centers [6,7,8,9]. For clustering high-dimensional data, the data can be mapped into a high-dimensional feature space (kernel space) for processing [10], or clustering can be achieved by optimizing a smooth and continuous objective function that is based on robust statistics [11]. This paper introduces a comprehensive empirical validation of these methods across simulation studies and real data analysis, detailing their improved stability over traditional methods and the practical implications of these advancements.
In fusion clustering, p-dimensional observations , are each parameterized by their own centroid . These centroids are estimated under the assumption that all observations can be grouped into K clusters , such that for , , where represents the cluster center for observations in cluster . Fusion clustering aims to concurrently estimate the cluster centroids and the partitions by minimizing the following objectives
The penalty function is used to control the complexity of the model, and it is determined by the tuning parameter . The form of the norm used is represented by . This penalty function is typically used in fusion clustering to encourage sparsity in the estimated cluster centroids.
The penalty function controls the complexity of the model and is determined by the tuning parameter . The norm used is . The penalty function is typically used in fusion clustering to promote sparsity in cluster centroids.
Convex fusion clustering methods have been widely studied due to their computational simplicity and ability to find global optima. These methods often employ , , or penalties as the penalty function [12,13,14,15,16,17]. However, convex fusion can lead to biased estimates of the individual centroids, resulting in solutions with a large number of dense clusters [18,19]. To address this issue, researchers have proposed using concave fusion clustering methods, such as those using minimax concave penalties (MCPs) [20], truncated Lasso penalties (TLPs) [8], and arbitrary concave penalties.
While robust, convex and concave fusion clustering methods are computationally demanding with a complexity, which can limit their practicality in scenarios involving large sample sizes n and high-dimensional datasets p. This article proposes a strategy for overcoming this limitation using random projection techniques [21,22,23,24]. The approach involves the construction of a random diagonal matrix whose diagonal elements are sourced from a binary distribution. This matrix is then projected onto the pairwise component of the fusion method. By doing so, the number of pairwise differences between individual centroids, , is substantially reduced. This reduction not only decreases the computational load but also maintains the integrity of the clustering process, enhancing the algorithm’s scalability without excessively increasing the operational overhead. We provide empirical evidence demonstrating that this method significantly reduces the computational time while preserving the clustering quality, as shown in our simulation section.
In unsupervised learning, rapid clustering processes are crucial for handling large datasets efficiently. Our study introduces a novel approach to fusion clustering to enhance computational speed without compromising accuracy. Our contributions are summarized as follows: (1) We propose using random projection techniques to simplify the fusion aspect of clustering, effectively diminishing the pairwise centroids discrepancies and significantly boosting computational efficiency by minimizing the fusion step’s complexity. (2) We have developed a novel double recursive random projection ADMM method designed for efficient high-dimensional fusion clustering, improving the accuracy of clustering.
In the remainder of this paper, the proposed new ADMM algorithm will be described in Section 2. This section will also include an analysis of the computational complexity and convergence of the algorithm. It will also include a strategy for improving cluster accuracy. The finite-sample properties of the proposed new ADMM algorithm will be evaluated through simulation studies in Section 3, and the method will be demonstrated using a real data example in Section 4. Concluding remarks will be presented in Section 5, and technical proofs will be provided in the Appendix A and Appendix B.
2. Methodology
To improve convex or concave fusion clustering efficiency, we propose an extension of the classical ADMM algorithm based on a random projection called RP-ADMM. A random projection can significantly reduce the time and computational resources needed to analyze high-dimensional data, making it suitable for large datasets and real-time processing. In this section, we will discuss the RP-ADMM algorithm’s computational complexity and convergence.
2.1. Random Projection Based ADMM
Previous ADMM algorithms for convex or concave fusion clustering [6,8] have suffered from a high computational burden due to the need to consider all pairwise differences between individual centroids. This is represented by the fusion matrix , where is the ith unit vector with a 1 in the ith position and 0s elsewhere, and can be interpreted as the difference between the ith and jth individual centroids. The computational complexity of this approach is , which becomes infeasible for large sample sizes n.
- Bernoulli distribution-based random projections ADMM
It is worth noting that pairwise differences between individual centroids can be deduced from other differences. For example, if we know that and , we can conclude that . This means that it may be unnecessary to consider the row in . To reduce the computational burden of convex or concave fusion clustering, we propose a random projection approach. This only considers a small subset of the pairwise differences between individual centroids. This is achieved by generating indicators from a Bernoulli distribution with probability . We then form a random matrix , which is a diagonal matrix with diagonal elements . If , the difference between and is taken into account; if , it is not considered. The probability controls the size of the subset of pairwise differences considered. The matrix can be seen as a projection of onto a sparse matrix. This is with about rows being zero vectors and about ones being nonzero vectors. This projection is based on a Bernoulli distribution. Finally, we form a new fusion matrix by deleting the rows of zero vectors in . The new fusion matrix is given by , where denotes jth row vector of .
We just consider in (1) for simplicity and propose a random projection-based fusion criterion by
where . Furthermore, the objective function in (2) is equivalent to
where , . Under the constraints in (3), the augmented Lagrangian has the form
where the dual variables are Lagrange multipliers, and is a tuning parameter. Under the iterative value and at the mth step, we conduct the Bernoulli distribution-based random projection ADMM (RP-ADMM) iterative algorithm and compute the estimates of as follows:
where equals
and equals
Ma and Huang (2017) [18] have argued that under (8), the element of is the minimizer of , where . For different thresholding operator , the estimate has different results. Such as,
For the Lasso penalty [25],
For SCAD penalty [26] with ,
For the MCP [27] with ,
For the TLP [8] with ,
Through some algebra, the problem of (9) is equivalent to the minimization of the function , which has the from
Under the given value of , , the updated are
where is identity matrix. and are updated according to the random projection ADMM iterative algorithm (5)–(7) until the input of some convergence criteria, such as both dual and primal residuals being close to zero [28] in our practice. The convergence time of ADMM is highly related to the penalty parameter . A poor selection of can result in a slow convergence for the ADMM algorithm [29] and thus RP-ADMM. In this paper, we fix throughout for simplicity.
To facilitate the updates of at the th step in (5) to (7) of the RP-ADMM iterative algorithm, we need to specify a proper initial value (warm start). Here, we set , and obtain the initial estimators as the minimizer of a ridge fusion criterion
We summarize the above analysis in Algorithm 1. Algorithm 1 RP-ADMM for fusion clusteringInput: data ; Initialize , ; tuning parameter, Output: an estimate of for do compute using (5) compute using (6) compute using (7) if convergence criterion is met, then Stop and denote the last iteration by , else end if end for
Practically, we would not want to conduct the RP-ADMM updates comprehensively until convergence to save computing time in the first iterations. Another trick is to adopt the initial values of subsequent convex relaxations as optimal values from the previous relaxed convex problem, which significantly reduces the number of RP-ADMM iterations.
2.2. Selection of Optimal Tuning Parameter
For a given , the converging value of the above RP-ADMM procedure is defined as
where is defined in (2) and the optimal value of can be selected via a properly constructed data-driven criterion. In particular, we partition the support of into a grid of , and for each , we compute a solution path of and obtain distinct cluster centroids , The optimal is selected by minimizing a data-driven BIC, i.e., , where
Subsequently, we obtain the estimator , and the individuals can be separated into clusters accordingly, i.e., , .
Other methods for tuning parameters in clustering, such as generalized degrees of freedom with generalized cross-validation [8] and stability-based cross validation [25,30] can provide good results but may require extensive computation or the specification of a hyperparameter perturbation size [8]. In contrast, the proposed BIC is easy to compute and performs well in estimating cluster centroids and the true number of clusters (K). Figure 1 shows the change in BIC values against and the cluster number of the simulation. Across all cases with different values of n and p, we observe that BIC( ) decreases as the value of increases. With recovering the true cluster number , BIC( ) reaches a minimum at the optimal . Moreover, when keeps increasing, the cluster centroids are continuously integrated, and BIC( ) is enlarged. However, further research is needed to fully prove the consistency of the BIC in combination with the objective function (2).
2.3. Recursive RP-ADMM and Cluster Matrix
In the above cluster analysis, the effect of randomness on the clustering results was not considered. However, empirical analysis has shown that the impact of this randomness on the estimated cluster centers and numbers is minimal (i.e., ’s and ’s). However, the impact on the final partitioning results (i.e., which observations are grouped into a single cluster) can be significant. In response to this, we propose the Recursive RP-ADMM (RRP-ADMM) procedure, which performs multiple RP-ADMM cluster analyses by generating M random matrices (i.e., ’s, ) and repeatedly conducting the analysis.
Once the multiple RP-ADMM cluster analyses have been completed, we must summarize the results. We define a symmetric cluster matrix where denotes that the ith and jth observations belong to the same cluster; otherwise, . Another symmetric matrix is introduced, with element representing the relative frequency of the ith and jth observations belonging to the same cluster over the M independent RP-ADMM clustering procedures. The decision of whether the ith and jth observations should be grouped into a single cluster or not can then be treated as a classification problem, with the two possible class labels being 1 (belong to the same cluster) or 0 (do not belong to the same cluster). We can use an indicator function to transform the relative frequency into class labels and generate an estimator for the cluster matrix , i.e.,
where denotes the indicator function. We summarize the above procedure in Algorithm 2. This transformation can be understood as a voting-based aggregation strategy, similar to the one proposed by [31], which aims to reduce misclassification errors and improve the accuracy of the clustering. To evaluate the accuracy of the clustering results, we define a new measure called the similarity index (SI) between two data clusterings:
Like the Rand Index (RI) measure [32], the newly introduced evaluation criterion can be seen as a measure of the percentage of correct decisions made by some algorithm. The SI values also range from 0 to 1, with lower values indicating better algorithm performance. Algorithm 2 RRP-ADMM for fusion clusteringInput: data ; M; Initialize , ; tuning parameter, Output: an estimate of for , M do compute using RP-ADMM end for while do compute and from (13) end while
The classical convex or concave fusion clustering procedure in (1) requires operations and of storage for a single round of ADMM updates with primal and dual residual calculations, because all pairs of centroids are shrunk together in this method.
The RP-ADMM algorithm significantly improves computational efficiency compared to classical ADMM algorithm. It requires only of storage, compared to for the classical ADMM algorithm, because the variables and have only columns rather than . Additionally, the RP-ADMM algorithm requires only operations for its most computationally demanding step, in comparison to for the classical ADMM algorithm. The RP-ADMM algorithm also requires operations to conduct Cholesky factorization in every iteration, in comparison to for the classical ADMM algorithm. This efficient Cholesky factorization is computed only once and reused across repeated RP-ADMM updates.
At the end of this subsection, we will demonstrate the convergence of the RP-ADMM algorithm by showing that the sequence generated by the algorithm contains a subsequence that converges to a stationary point.
Lemma 1. Let be the sequence generated by Algorithm 1, then for some constant ,
In order to prove that the sequence is convergent, we need to assume that is bounded and which are often observed in numerical tests.
Theorem 1. If are bounded and , then is bounded. Moreover, there exist a subsequence , such that
and thus, has a subsequence which converges to the stationary point.
3. Simulation
In this part of the study, simulation experiments were conducted to compare the performance of the extended and classical ADMM clustering algorithms in terms of computational time and clustering accuracy, using the evaluation criterion in (14). The Lasso-based fusion method often leads to the formation of dense clusters with a minor penalty for small differences in , which can result in the formation of many spurious clusters with very small differences among them [6]. In contrast, the concave penalty method tends to produce a clear cluster structure and a well-defined number of clusters [8]. Therefore, in this study, we focus on the MCP-based fusion method [27] which compares the conventional ADMM’s clustering performance and the proposed new ADMM algorithm.
3.1. Low-Dimensional Setting
In this part, we evaluated the clustering performance of the classical ADMM, RP-ADMM, and RRP-ADMM algorithms on low-dimensional synthetic data generated from three overlapping convex clusters with the same spherical shape in some number of dimensions p and sample size n. The synthetic data were generated from three populations , with , , , and with and for . This setting was chosen deliberately to allow overlap in the sample sets generated from clusters proximal to each other, thereby increasing the complexity of the clustering task. As illustrated in Figure 2c, the clustering performance using a single random projection (RP-ADMM) was suboptimal, indicating challenges with cluster separability under this setup. Conversely, Figure 2b demonstrates that recursive random projection (RRP-ADMM) significantly improved clustering results. The recursive times for the RP-ADMM and RRP-ADMM algorithms were set to .
To evaluate the accuracy of the RP-ADMM, relax-and-split approach [33] (RS-ADMM) and RRP-ADMM algorithms in recovering the true cluster matrix, we generated a random sample of observations with 1–20 drawn from , 21–40 drawn from , and 41–60 drawn from , and set the number of dimensions to . The probability of generating a 1 in the random matrix was set to , where c controls the probability size. The level plots in Figure 2 use colour to visualize the values of 1’s and 0’s in the cluster matrix. The results show that both RP-ADMM and RRP-ADMM can accurately recover the true cluster matrix, with RRP-ADMM showing more accurate gradation than the true cluster matrix. Single random projection (RP-ADMM) can cause high variance in clustering outcomes due to the randomness of the sampling process. To mitigate this issue, we have adopted the voting-based pooling technique [31], which reduces variance by averaging results from recursive random projection (RRP-ADMM).
To further evaluate the performance of the algorithms, we calculated the values of the index SI defined in (14) after 100 replicates under different c choices. We depicted the results as boxplots in Figure 3. These results show that RRP-ADMM consistently improves clustering accuracy compared to RP-ADMM, as evidenced by the smaller median and standard error of SI values.
Next, we will compare the performance of classical ADMM and RRP-ADMM in terms of computation time per iteration and the SI after 100 trials. The sample size is varied with points and , while is kept constant. In this study, we have limited the number of points to 360, as the classical ADMM algorithm requires a significant amount of computation time for a single realization with more points. We will also compare the performance of the Similarity Index (SI) and Rand Index (RI) in evaluating the clustering results. Therefore, we should calculate the partitioning structure of all points based on the estimated cluster matrix graph. This process involves first identifying the point with the most neighbors and aggregating the connected points with point as cluster 1, then finding the second point with the most edges to form cluster 2, and repeating this process until there are no more points remaining.
Table 1 shows the mean values of the SI, RI, and the consumed time in seconds for different sample sizes under different methods after 100 replicates. Based on the data in Table 1, we can observe the following: (i) The proposed RRP-ADMM significantly reduces the time required for convex or concave fusion clustering, especially when the sample size increases. (ii) RRP-ADMM produces smaller SI and larger RI values, possibly due to the voting-based pooling technique improving cluster accuracy. (iii) As the sample size increases, the SI and RI values decrease. The boxplots in Figure 4 and Figure 5 demonstrate the superiority of the RRP-ADMM algorithm over the classical ADMM algorithm in terms of both the SI values and the square root of run time, as seen in the results obtained from 100 replicates with four different sample sizes. These results further reinforce our belief in the effectiveness of the RRP-ADMM algorithm.
3.2. High-Dimensional Setting
In this part, we investigate using the double random projection-based alternating direction method of multiplier (DRP-ADMM and DRRP-ADMM) algorithms for clustering high-dimensional data sets. We employ a recursive Gaussian distribution-based random projection strategy in the first step to mitigate the impact of randomness on cluster results. Since the classical ADMM algorithm is computationally intensive in high-dimensional settings, we focus on evaluating the performance of the DRP-ADMM and DRRP-ADMM algorithms with recursive times , using three Gaussian random projections in the outer layer and three binary random projections in the inner layer. The simulated data sets consist of two overlapping convex clusters with the same spherical shape. They are generated using a population , with , . Furthermore, with and for . We consider four high-dimensional cases with and a fixed sample size of .
We evaluate the accuracy of the DRP-ADMM and DRRP-ADMM algorithms in recovering the true cluster matrix. To do this, we first generate a Gaussian random matrix with dimensions in the first projection. The elements of correspond to . We set with and . See [21,23] for the number of projections. In the second step, we generate a diagonal binary random matrix with probability of equaling one. Then, we calculate the values of the SI index defined in Equation (14) and plot the results as boxplots in Figure 6 after 100 replicates for different values of p. The results show that the DRRP-ADMM algorithm consistently outperforms the DRP-ADMM algorithm regarding the median and standard error of the SI values for all values of p, indicating that the DRRP-ADMM algorithm improves clustering accuracy.
4. Real Data Analysis
In this study, we use the DrivFace dataset to demonstrate the effectiveness of our proposed clustering procedure. The DrivFace database consists of images of 640,480 pixels each, captured from four drivers (two women and two men) over different days and containing facial features such as glasses and beards. Each driver’s images containing similar facial features can be grouped into one cluster, resulting in a total of clusters as shown in Figure 7a. Firstly, we know the true labels of the dataset; that is, there are four clusters, and we also know which observations belong to the common cluster. Secondly, because the similarity among observations in the pictures is very high across different clusters, it is challenging to separate them. Therefore, we can use this dataset to evaluate our proposed clustering method.
Due to the large sample size of the DrivFace dataset, we do not use the classical ADMM algorithm, which would require operations in a single ADMM iteration. Instead, we first scale the samples by each feature and apply the RP-ADMM procedure to estimate individual centers using a grid of values. We plot the of four selected variables in Figure 8, and the scrutiny of Figure 8a implies that some outlying points (influential points) cause the clusters to be dense. We then remove these 55 points and plot a new in Figure 8b. The optimal value, as determined by the developed BIC criterion in Equation (12), is , indicating that the estimated number of clusters is four, the same as the number of drivers. We apply the proposed RRP-ADMM algorithm with a Bernoulli-distribution-based random projection procedure to further improve the cluster accuracy using and a recursive number . Using the estimated optimal tuning parameter of , we obtain the estimated cluster matrix in Figure 7b, which closely resembles the true cluster matrix in Figure 7a. The calculated similarity index (SI) value is . Moreover, the value of Adjusted Rand Index (ARI) is 0.672.
5. Conclusions
We propose using the recursive random projection-based ADMM (RRP-ADMM) method to improve the speed and accuracy of convex and nonconvex fusion clustering. In simulations and real data examples, the RRP-ADMM method demonstrates superior performance in fast calculation and accurate clustering results. The RRP-ADMM algorithm is scalable and can be applied to deal with heterogeneous issues in any setting that involves fusion techniques.
However, some challenges still need to be addressed in this field. One challenge is efficiently transforming the cluster matrix graph into the target partitioning structure and determining the optimal number of clusters. Another challenge is using prior information about which points are more likely to be integrated into a single cluster to reduce the number of pairwise comparisons. Additionally, a further study is needed to determine the theoretical probability of achieving a probability of one in binary random projection. Another future research direction involves performing clustering simultaneously with feature selection, using techniques such as incorporating feature weights [34] or introducing sparsity [14].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Haq M.A. CDLSTM: A novel model for climate change forecasting Comput. Mater. Contin.202271210.32604/cmc.2022.023059 · doi ↗
- 2Haq M.A. SMOTEDNN: A novel model for air pollution forecasting and AQI classification Comput. Mater. Contin.202271110.32604/cmc.2022.021968 · doi ↗
- 3Van Der Kloot W.A. Spaans A.M.J. Heiser W.J. Instability of hierarchical cluster analysis due to input order of the data: The Permu CLUSTER solution Psychol. Methods 20051046810.1037/1082-989X.10.4.46816393000 · doi ↗ · pubmed ↗
- 4Xu R. Wunsch D. Survey of clustering algorithms IEEE Trans. Neural Netw.20051664567810.1109/TNN.2005.84514115940994 · doi ↗ · pubmed ↗
- 5Yang X. Yan X. Huang J. High-dimensional integrative analysis with homogeneity and sparsity recovery J. Multivar. Anal.201917410452910.1016/j.jmva.2019.06.007 · doi ↗
- 6Chi E.C. Lange K. Splitting methods for convex clustering J. Comput. Graph. Stat.201524994101310.1080/10618600.2014.94818127087770 PMC 4830509 · doi ↗ · pubmed ↗
- 7Lindsten F. Ohlsson H. Ljung L. Clustering using sum-of-norms regularization: With application to particle filter output computation Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP)Nice, France 28–30 June 201120120410.1109/SSP.2011.5967659 · doi ↗
- 8Pan W. Shen X. Liu B. Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty J. Mach. Learn. Res.201314186524358018 PMC 3866036 · pubmed ↗
