Theory of Spectral Method for Union of Subspaces-Based Random Geometry Graph
Gen Li, Yuantao Gu

TL;DR
This paper develops a theoretical framework for spectral methods in clustering data near unions of subspaces using random geometry graphs, demonstrating broad conditions for effectiveness and supporting findings with numerical experiments.
Contribution
It provides the first comprehensive theory analyzing spectral subspace clustering via random geometry graphs, expanding understanding of its efficiency and potential applications.
Findings
Spectral method effectively clusters data near unions of subspaces.
Theoretical analysis confirms broad conditions for success.
Numerical experiments validate the theoretical predictions.
Abstract
Spectral Method is a commonly used scheme to cluster data points lying close to Union of Subspaces by first constructing a Random Geometry Graph, called Subspace Clustering. This paper establishes a theory to analyze this method. Based on this theory, we demonstrate the efficiency of Subspace Clustering in fairly broad conditions. The insights and analysis techniques developed in this paper might also have implications for other random graph problems. Numerical experiments demonstrate the effectiveness of our theoretical study.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition · Computational Geometry and Mesh Generation · Topological and Geometric Data Analysis
Theory of Spectral Method for Union of Subspaces-Based Random Geometry Graph
Gen Li and Yuantao Gu
The authors are with Department of Electronic Engineering, Tsinghua University, Beijing 100084, China. The corresponding author of this paper is Y. Gu ([email protected]).
(Manuscript submitted July 23, 2019.)
Abstract
Spectral Method is a commonly used scheme to cluster data points lying close to Union of Subspaces by first constructing a Random Geometry Graph, called Subspace Clustering. This paper establishes a theory to analyze this method. Based on this theory, we demonstrate the efficiency of Subspace Clustering in fairly broad conditions. The insights and analysis techniques developed in this paper might also have implications for other random graph problems. Numerical experiments demonstrate the effectiveness of our theoretical study.
Keywords: Spectral Method, Union of Subspaces, Subspace Clustering, Random Graph, Random Geometry Graph
1 Introduction
1.1 Motivation
Union of Subspaces (UoS) model serves as an important model in statistical machine learning. Briefly speaking, UoS models those high-dimensional data, encountered in many real-world problems, which lie close to low-dimensional subspaces corresponding to several classes to which the data belong, such as hand-written digits (Hastie and Simard, 1998), face images (Basri and Jacobs, 2003), DNA microarray data (Parvaresh et al., 2008), and hyper-spectral images (Chen et al., 2011), to name just a few. A fundamental task in processing data points in UoS is to cluster these data points, which is known as Subspace Clustering (SC). Applications of SC has spanned all over science and engineering, including motion segmentation (Costeira and Kanade, 1998, Kanatani, 2001), face recognition (Wright et al., 2008), and classification of diseases (McWilliams and Montana, 2014) and so on. We refer the reader to the tutorial paper (Vidal, 2011) for a review of the development of SC.
Considering the wide applications of SC, numerous algorithms have been developed for subspace clustering (Tipping and Bishop, 1999, Tseng, 2000, Vidal et al., 2005, Yan and Pollefeys, 2006, Elhamifar and Vidal, 2009, Peng et al., 2018, Meng et al., 2018). Arguably, a series of two-step algorithms, referring to Sparse Subspace Clustering (SSC) and its variants (Elhamifar and Vidal, 2009, Liu et al., 2012, Dyer et al., 2013, Heckel and Bölcskei, 2015, Chen et al., 2017), are the most popular and efficient methods for solving SC, which first construct a random graph (or an adjacent matrix equivalently), named as Union of Subspaces-based Random Geometry Graph (UoS-RGG), depending on the relative position among data points, and then apply the spectral method (Ng et al., 2002, Von Luxburg, 2007) to obtain the clustering result.
In spite of all these algorithms that practically work well for many applications, theoretical guarantees are lacked for the accuracy of clustering of any SC algorithm. We note that although novel and often efficient subspace clustering techniques emerge all the time, establishing rigorous theory for such techniques is quite difficult and does not exist as of now. The fundamental difficulty in the analysis of SC algorithms may be the change of view required in treating UoS-RGG (or general Random Geometry Graph, RGG), which has non-independent edges, in contrast with the traditional approach to analyzing clustering algorithms via Stochastic Block Model (SBM) which assumes independent edges. Section 1.2 offers a detailed discussion of this difficulty, as well as a survey of the existing attempts in theoretical aspects. We therefore propose the critical question that this paper aims to explore:
- •
Why does SC work, or more precisely, why does spectral method work for RGG or UoS-RGG?
This paper focuses on the analysis on the spectral method for UoS-RGG. We consider a naive and prototypical SC algorithm (Algorithm 1) here, and prove this algorithm, though oversimplified, can still deliver an almost correct clustering result even when the subspaces are quite close to each other and when the number of samples is far less than the subspace dimension (see Theorem 1). To the best of our knowledge, this is the first ever theory established to analyze the clustering error of SC algorithm. It not only constitutes the first theoretical guarantee for accuracy of subspace clustering, but also provides the interesting insight that the widely-conjectured oversampling requirement for subspace clustering is redundant, and that subspace clustering is quite robust in existence of closely aligned subspaces. We also verify our results by numerical experiments in Section 4. Although our theoretical results is proved only for the simplified algorithm we choose, it should be quite convincing that more carefully-designed SC algorithms would give even better performance than what we guarantee here, and our proof could serve as a prototype to the analysis of these algorithms.
1.2 Related Works and Challenges
We now briefly review the literature on the adjacent matrix and spectral method and discuss their shortcomings. Since this paper mainly deals with theory, we shall focus on theoretical aspects of existing results.
1.2.1 Analysis of Random Graphs for UoS
To analyze the random graphs associated to UoS model in an abstract setting without referring to any specific algorithms, most researches focus on the Subspace Detection Property (SDP, Soltanolkotabi et al., 2012, Liu et al., 2012, Soltanolkotabi et al., 2014), a property which indicates that there are no edge connections between the data points in different subspaces, but are many connections between the data points in the same subspace. Under some technical conditions on the parameters of SC, the random graphs constructed by a variety of SC algorithms have been proved to enjoy SDP. Readers may consult Section 3 in Soltanolkotabi et al. (2014) for details.
There are, however, two main deficiencies of SDP which render SDP hard to use in further analysis. The first one is that SDP does not imply a correct clustering result. Actually, one can easily construct a counter-example where SDP holds but the clustering result is unsatisfying. The second one is that SDP requires too restrictive conditions on affinity between subspaces and sampling rate to hold. These conditions are provably unnecessary, as will be demonstrated in Section 3 of this paper.
1.2.2 Analysis of Spectral Method for Random Graphs
Compared with SDP, a more concrete approach to analyze SC algorithm is to investigate the performance of spectral method on random graphs associated to UoS model. To this end, analysis of spectral method for general random graphs (not necessarily associated to UoS model) is relevant. Note that the spectral method is explored deeply in the literature of community detection, which is an important problem in statistics, computer vision, and image processing (Abbe, 2017). Stochastic Block Model (SBM) is a widely used theoretical model in this field, which we briefly introduce as follows. For simplicity, we consider the two-block case, where the vertices of random graph are divided into two “blocks”, i.e. sets of vertices that ought to be closely-related, each of size of . Then each edge of random graph is independently generated from the following distribution: for , vertices and are connected with probability if belong to the same block, and with probability if they belong to different blocks. Given an instance of this graph, we would like to identify the two blocks. Recently, a series of theoretical works are devoted to analyze the performance of spectral method on this problem in different settings (Coja-Oghlan, 2010, Vu, 2014, Chin et al., 2015, Abbe et al., 2017), and extensions (Sankararaman and Baccelli, 2018).
As far as we know, all existing results make essential use of the independence of different edges, which is unfortunately not the case in SC algorithms. In fact, it is a generic and natural phenomenon in RGG that when and are connected, the probability that are connected will be higher, hence the independence assumption does not hold for RGG.
With this fundamental gap in mind, it is crucial to develop a theory for RGG to provide a rigorous theoretical guarantee for SC algorithms.
2 Preliminaries and Problem Formulation
The generative model for data points in UoS we adapt in this paper is the semi-random model introduced in Soltanolkotabi et al. (2012), which assumes that the subspaces are fixed with points distributed uniformly at random on each subspace. This is arguably the simplest model providing a good starting point for a theoretical investigation. We assume the data consists of two clusters, corresponding to two fixed subspaces111It should be noticed that the number of subspaces is by no means crucial to the analysis. The results in this paper can be generalized to more subspaces easily. in , each with data points uniformly sampled from the unit spheres and respectively in and . Here is the subspace dimension and is the ambient dimension. The goal of SC is to cluster the normalized data points .
Given the general description of SC, we turn our attention to a simple prototypical SC algorithm detailed in Algorithm 1, which we call Thresholding Inner-Product Subspace Clustering (TIP-SC). Considering that the angle between the data points in the same subspaces would be smaller statistically, we construct for some threshold the random graph by computing its adjacent matrix , where if , and otherwise. The TIP-SC algorithm concludes with applying the spectral clustering method on .
The main task of this paper is to prove this simple algorithm can achieve a high clustering accuracy under fairly general condition, which will be done in the next section.
Notations.
Let denote the orthonormal bases for the subspaces , respectively, and denote the singular values of . We also use and to denote the subspaces to which does and doesn’t belong, respectively. Then where denotes the orthonormal bases for , , and denotes its normalization. We use to represent the probability that for and , respectively. Conditioned on , let denote the probability of for , and denote the probability of for . Denote
[TABLE]
Let with , and , if , and , if , then is the ground truth. denotes the eigenspace corresponding to the top two eigenvalues of , and denotes the vector in , which is perpendicular to the projection of in .
3 Error Rate of TIP-SC Algorithm
This section presents our main theoretical results concerning the performance of TIP-SC. By the perturbation analysis of from , the success of spectral method for SBM has been proved in various statistical assumptions. However, such analysis is insufficient to establish our result, since for UoS-RGG, the independence condition doesn’t hold, which is the crux leading to the failure of the existing methods for analyzing spectral method on random graph. As a substitute, we discover the conditional independence property for , based on which we prove that the clustering result of TIP-SC is almost correct under some mild condition on affinity and sampling rate, which is explained in the following theorem.
Theorem 1**.**
Choosing such that , there exists some numerical constant , such that whenever , the clustering error rate of TIP-SC is less than with probability at least .
Parameter selection is often critical for the success of algorithms. The above result suggests that a dense graph () is usually a good choice, which is quite different with SDP.
In this regime, the above result indicates that the algorithm works correctly in fairly broad conditions compared with existing analysis for SC. A fascinating insight revealed by the above theorem is that even when the number of samples , we can succeed to cluster the data set, which demonstrates the commonly accepted opinion that is necessary for SC is partially inaccurate.
To clarify the condition on , namely on affinity, assume these two subspaces overlap in a smaller subspace of dimension , but are orthogonal to each other in the remaining directions. In this case, the affinity between the two subspaces is equal to . Our assumption on indicates that subspaces can have intersections of almost all dimensions, i.e., . In contrast, previous works (Soltanolkotabi et al., 2012, 2014) imposes that the overlapping dimension should obey , so that the subspaces are practically orthogonal to each other.
In the noisy case, we assume each data point is of the form
[TABLE]
where denotes the clean data used in the above theorem, and is an independent stochastic noise term. We have the following robustness guarantee for TIP-SC.
Theorem 2**.**
Choosing such that , there exists some numerical constant , such that whenever and , the clustering error rate of TIP-SC is less than with probability at least .
The proof is similar to that of Theorem 1, and both are deferred to Section 5.
4 Numerical Experiments
In this section, we perform numerical experiments validating our main results. We evaluate the algorithm and theoretical results based on the clustering accuracy. The impacts of on the clustering accuracy are demonstrated. Besides, we also show the efficiency of TIP-SC in the presence of noise.
According to the definition of semi-random model, to save computation and for simplicity, the data are generated by the following steps.
Given and , define , whose entries are zero but the -th entry is one. Let be the orthonormal basis for subspace for , and be the orthonormal basis for subspace for , such that the affinity between and is .
- 2)
Given , generate vectors independently from . Let for and for .
- 3)
In the presence of noise, given , generate random noise terms independently from . Let the normalized data of be the input of Algorithm 1.
Since there are too many factors we need to consider, we always observe the relation between two concerned quantities, while keep others being some predefined typical values, i.e., , and is chosen to be such that the connection rate . We conduct the experiments in noiseless situations, except the last one which tests the robustness of Algorithm 1. Moreover, the curves are plotted by trials in each experiment, while the mean and the standard deviation are represented by line and error bar, respectively. We can find that the randomness is eliminated in all experiments when the error rate is small.
It is obvious that will decrease simultaneously if decreases by increasing , which is also demonstrated in Figure 1. Combining the result of the second experiment (c.f. Figure 2), we can find that it is better to make both large than to choose , although is suggested by SDP, which is consistent with our result, while shows that SDP is somewhat inadequate for SC.
In the third and fourth experiments, we inspect the impacts of affinity and sampling rate on the performance of TIP-SC. From Figure 3 and Figure 4, the claim that SC works well in fairly broad conditions is verified. In addition, according to (1), we have
[TABLE]
then the last experiment (c.f. Figure 5) shows that the algorithm is robust even though SNR is low.
5 Proof of Main Results
5.1 Proof of Theorem 1
Recall the definition of in Section 2, and notice that analyzing the error rate, denoted by , is equivalent to studying the difference between and . Without loss of generality we may assume that , thus the error rate is exactly
[TABLE]
To estimate , it suffices to bound the distance between and .
By simple geometric consideration, we have
[TABLE]
where denote the normalization of . Moreover, for any , we have
[TABLE]
where denotes the third largest eigenvalue of .
Summing up, for ,
[TABLE]
Considering that , we expect is a good choice. Similarly, choose .
From above discussion, to estimate we need to:
- •
Prove and are sufficiently small (see Lemma 3 and Lemma 4).
- •
Prove and are sufficiently large, which is equivalent to showing is large enough (see Lemma 3) and is small enough (see Lemma 5).
Before proceeding, we analyze the adjacent matrix based on the conditional independence property, and provide probability estimations used in the proof of Theorem 1. Specifically, this refers to if conditioned on for some subset of , , for , are functions of , respectively, and then are independent from each other.
Moreover, recalling the definition of , on the collection of events given by the intersection of
[TABLE]
if conditioned on , , for are nearly identically distributed, and for some , , for are nearly independent from each other, which will be explained and employed many times in the following analysis. According to Lemma 7 and Lemma 8, there exist some constants , such that
[TABLE]
For simplicity, use to denote . In this work, we will always analyze the spectral method on the canonical event set .
Let
[TABLE]
then
Lemma 1**.**
All are equal, and there exist constants , such that
[TABLE]
where and .
Proof.
Conditioned on , for
[TABLE]
where , and denote the normalization with . According to the independence between and the rotational invariance property of Gaussian random vectors, it is obviously that all are equal. Moreover, we have
[TABLE]
since is a Gaussian random variable independent with , and
[TABLE]
according to Lemma 7. ∎
Lemma 2**.**
There exist constants , such that for , on , we have
[TABLE]
where .
Proof.
According to Remark 5 in Li and Gu (2017), we can choose such that
[TABLE]
Without loss of generality, assume that , then
[TABLE]
where , and denote the normalization with . In addition, the definition of gives,
[TABLE]
then according to Lemma 7
[TABLE]
and similarly,
[TABLE]
∎
Specifically, according to the above two lemmas about , we can easily get the following lemma.
Lemma 3**.**
Choose , then . Moreover, on , there exists some constant , such that if ,
[TABLE]
and
[TABLE]
Having finished the calculation about the probability of each entry, we now turn to the overall properties of .
Lemma 4**.**
Conditioned on , for any
[TABLE]
and
[TABLE]
Proof.
Given , it can be easily checked that the angels between and are independent with each other, then are conditionally independent Bernoulli random variables. Hence, according to Lemma 9, the results is obvious. ∎
In the next lemma, we will analyze the eigenvalue of .
Lemma 5**.**
For , on , with probability at least ,
[TABLE]
where denotes the third largest eigenvalue of , and are some constants.
Proof.
We transfer the estimation of to bounding using Lemma 10, i.e.,
[TABLE]
where are defined in Section 2, and
[TABLE]
then , if , , if and , if .
The analysis of is based on the decoupling technique. According to Lemma 11, let be a random subset of with average size , then
[TABLE]
where denotes the sub-matrix of including the rows from and columns from , and denotes the operator norm.
To analyze , we first condition on and , and for , let , and , then are independent with each other. On ,
[TABLE]
Moreover, for the diagonal entries of ,
[TABLE]
On the other hand, for the off-diagonal entries of , if ,
[TABLE]
since . With similar analysis on the cases and , we have the off-diagonal entries of are less than . Hence,
[TABLE]
and Lemma 12 gives, for ,
[TABLE]
Then
[TABLE]
Hence, with probability at least ,
[TABLE]
Summing up,
[TABLE]
We conclude the proof. ∎
Now, we have all the ingredients for the proof of Theorem 1.
Proof of Theorem 1.
We begin with some inequalities for estimating the error. We have
[TABLE]
According to Lemma 4, for all , we have, with probability at least ,
[TABLE]
and
[TABLE]
On the other hand, Lemma 3 gives, with probability at least ,
[TABLE]
Summing up, we have, with probability at least ,
[TABLE]
Similarly, with probability at least ,
[TABLE]
According to Lemma 5, for , with probability at least , the third largest eigenvalue of satisfies
[TABLE]
With these estimations at hand, recall
[TABLE]
Lemma 3 gives , then we have
[TABLE]
We conclude the proof. ∎
5.2 Proof of Theorem 2
Robustness analysis can be completed by following the similar analysis method. We provide the differences in the analysis of noise, while omit the details.
Here, we only need to pay attention to the changes of Lemma 3, Lemma 4, and Lemma 5, when adding noise. Notice that the noise terms do not destroy the wonderful conditional independence property, then it’s obvious that except the estimation for , all other bounds still hold in a similar way. Through simple calculation, the contribution of noise has the form
[TABLE]
Taking this change into account, we can get the result easily.
6 Conclusion
This paper establish a theory to analyze spectral method for Random Geometry Graph constructed by data points from Union of Subspaces. Based on this theory, we demonstrate the efficiency of Subspace Clustering in fairly broad conditions. To the best of our knowledge, the clustering accuracy has not been shown in the prior literature. The insights and analysis techniques developed in this paper might also have implications for other Random Geometry Graph.
Moving forward, one issue is to understand UoS-RGG constructed by more complex strategy, such as SSC. Additionally, ideally one would desire an exact recovery by spectral method, which needs entrywise analysis. We leave these to future investigation.
Appendix A Auxiliary Lemmas
In this subsection, we introduce some well-known results about Gaussian, Bernoulli random variables, and matrices (Vershynin, 2010), which shall be used to analyze the properties of the adjacent matrix . We omit the proof for most of them.
Lemma 6** (Concentration in Gauss space (Ledoux, 2001)).**
Let be a real valued Lipschitz function on with Lipschitz constant , i.e.,
[TABLE]
for any (such functions are also called K-Lipschitz). Let be the standard Gaussian random vector in , then for every , one has
[TABLE]
Lemma 7**.**
Assume , then for any
[TABLE]
Moreover, for and
[TABLE]
Proof.
Let
[TABLE]
then by calculation
[TABLE]
Hence, is and according to Lemma 6, we have
[TABLE]
Take , then similarly
[TABLE]
Moreover, and
[TABLE]
Taking , we prove (2). Taking square, we prove (3). ∎
Here, we also use to denote the angle between and .
Lemma 8** (Concentration of measure (Ledoux, 2001)).**
Assume , then for any
[TABLE]
Lemma 9**.**
are generated independently from , then for any
[TABLE]
Proof.
According to Bernstein’s Inequality, the conclusion is obvious. ∎
Lemma 10**.**
For any symmetric matrix ,
[TABLE]
where denotes the subspace of of dimension .
Proof.
This is a basic property of eigenvalues. ∎
We define a random subset of with average size as follows. For all , belongs to with probability independently from each other. Then we state an elementary decoupling lemma for double arrays here.
Lemma 11** (Decoupling (Helmers, 2000)).**
Consider a double array of real numbers such that for all . Then
[TABLE]
where is a random subset of with average size .
Lemma 12** (Matrix Bernstein: Mgf and Cgf Bound, Lemma 6.6.2 (Tropp et al., 2015)).**
Suppose that is a random Hermitian matrix that satisfies
[TABLE]
then for
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abbe (2017) E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research , 18(1):6446–6531, 2017.
- 2Abbe et al. (2017) E. Abbe, J. Fan, K. Wang, and Y. Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. ar Xiv preprint ar Xiv:1709.09565 , 2017.
- 3Basri and Jacobs (2003) R. Basri and D. W. Jacobs. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence , (2):218–233, 2003.
- 4Chen et al. (2011) Y. Chen, N. M. Nasrabadi, and T. D. Tran. Hyperspectral image classification using dictionary-based sparse representation. IEEE Transactions on Geoscience and Remote Sensing , 49(10):3973–3985, 2011.
- 5Chen et al. (2017) Y. Chen, G. Li, and Y. Gu. Active orthogonal matching pursuit for sparse subspace clustering. IEEE Signal Processing Letters , 25(2):164–168, 2017.
- 6Chin et al. (2015) P. Chin, A. Rao, and V. Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. In Conference on Learning Theory , pages 391–423, 2015.
- 7Coja-Oghlan (2010) A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing , 19(2):227–284, 2010.
- 8Costeira and Kanade (1998) J. P. Costeira and T. Kanade. A multibody factorization method for independently moving objects. International Journal of Computer Vision , 29(3):159–179, 1998.
