Sparse-TDA: Sparse Realization of Topological Data Analysis for Multi-Way Classification
Wei Guo, Krithika Manohar, Steven L. Brunton, Ashis G. Banerjee

TL;DR
Sparse-TDA introduces a novel method combining topological data analysis with sparse sampling, efficiently capturing shape features for multi-way classification tasks in high-dimensional data.
Contribution
It presents a new algorithm that selects sparse samples from persistent topological features using QR pivoting, enhancing classification performance.
Findings
Effective on human posture recognition
Improves image texture classification
Demonstrates promising results on benchmark datasets
Abstract
Topological data analysis (TDA) has emerged as one of the most promising techniques to reconstruct the unknown shapes of high-dimensional spaces from observed data samples. TDA, thus, yields key shape descriptors in the form of persistent topological features that can be used for any supervised or unsupervised learning task, including multi-way classification. Sparse sampling, on the other hand, provides a highly efficient technique to reconstruct signals in the spatial-temporal domain from just a few carefully-chosen samples. Here, we present a new method, referred to as the Sparse-TDA algorithm, that combines favorable aspects of the two techniques. This combination is realized by selecting an optimal set of sparse pixel samples from the persistent features generated by a vector-based TDA algorithm. These sparse samples are selected from a low-rank matrix representation of persistent…
| Method | SHREC’14 | SHREC’14 | OuTeX | |
| Synthetic | Real | Texture | ||
| L1-SVM | LW | |||
| NW | ||||
| Sparse-TDA | LW | |||
| NW | ||||
| Kernel TDA | ||||
| Method | SHREC’14 | SHREC’14 | OuTeX | |
| Synthetic | Real | Texture | ||
| L1-SVM | LW | |||
| NW | ||||
| Sparse-TDA | LW | |||
| NW | ||||
| Kernel TDA | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Vision and Imaging · Cell Image Analysis Techniques
Sparse-TDA: Sparse Realization of Topological Data Analysis for Multi-Way Classification
Wei Guo, Krithika Manohar, Steven L. Brunton and Ashis G. Banerjee W. Guo is with the Department of Industrial & Systems Engineering, University of Washington, Seattle, WA, 98195. E-mail: [email protected]. K. Manohar is with the Department of Applied Mathematics, University of Washington, Seattle, WA, 98195. E-mail: [email protected]. S. L. Brunton is with the Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195. E-mail: [email protected]. A. G. Banerjee is with the Department of Industrial & Systems Engineering and Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195. E-mail: [email protected].
Abstract
Topological data analysis (TDA) has emerged as one of the most promising techniques to reconstruct the unknown shapes of high-dimensional spaces from observed data samples. TDA, thus, yields key shape descriptors in the form of persistent topological features that can be used for any supervised or unsupervised learning task, including multi-way classification. Sparse sampling, on the other hand, provides a highly efficient technique to reconstruct signals in the spatial-temporal domain from just a few carefully-chosen samples. Here, we present a new method, referred to as the Sparse-TDA algorithm, that combines favorable aspects of the two techniques. This combination is realized by selecting an optimal set of sparse pixel samples from the persistent features generated by a vector-based TDA algorithm. These sparse samples are selected from a low-rank matrix representation of persistent features using QR pivoting. We show that the Sparse-TDA method demonstrates promising performance on three benchmark problems related to human posture recognition and image texture classification.
Index Terms:
Topological data analysis, sparse sampling, multi-way classification, human posture data, image texture data.
1 Introduction
Multi-way or multi-class classification, where the goal is to correctly predict one out of classes for any data sample, poses one of the most challenging problems in supervised learning. However, a large number of real-world sensing problems in a variety of domains such as computer vision, robotics and remote diagnostics, do consist of multiple classes. Examples include human face recognition for surveillance, object detection for mobile robot navigation, and critical equipment condition monitoring for preventive maintenance. The number of classes in these problems often exceeds ten and sometimes goes up to a hundred depending on the complexity of the sensed system or environment and the number and types of sensor modalities.
While a whole host of techniques such as artificial neural networks, decision trees, naïve Bayes, nearest neighbors, and support vector machines (SVMs) have been successfully applied for binary classification problems, extensions of these techniques have had mixed success in addressing multi-way classification problems with more than a few classes. Other approaches involving hierarchical classification or transformation to binary classification have not been particularly successful either. The success rates diminish further in the absence of a large number of data samples for each of the labeled classes. The primary reason is that all of these methods encounter difficulties in selecting the right set of distinguishing features among the different classes.
Recent research has started investigating completely new techniques for multi-way classification that attempt to better understand the structure of the underlying high-dimensional sample space. One such class of techniques is topological data analysis, or TDA in short. TDA represents the unknown sample space in the form of persistent shape descriptors that are coordinate free and deformation invariant. Thus, the descriptors define topological features and yield insights regarding suitable feature selection.
Another critical tool facilitating multi-way classification is the feature-driven sparse sampling of high-dimensional data. Observations are typically sparse in a transform basis of the informative features, so that samples can be optimally chosen to enhance the discriminating features in the data. This sparsity permits heavily subsampled inputs for downstream classifiers, which drastically reduces the burdens of sample acquisition, processing and storage without sacrificing performance.
Here, we bring together the two research areas of TDA and sparse sampling in the context of multi-way classification. In particular, we leverage QR pivoting-based sparse sampling for optimal feature selection once the topological features are extracted using a state-of-the-art TDA method. We test our method on three challenging data sets pertaining to 3D meshes of synthetic and real human postures and textured images, respectively. We call our new method the Sparse-TDA algorithm. We show that it achieves comparable accuracy as the kernel TDA method with substantially lower training times, and better accuracy with comparable or lower training times than widely-used L1-regularized classifiers. Thus, our method opens up a new direction in making online multi-way classification practically feasible.
2 Related Work
Over the past decade or so, an increasing interest in utilizing tools from algebraic topology to extract insights from high dimensional data has given rise to the field of TDA. The successful applications of TDA have spanned a large number of areas, ranging from computer vision [1] to medical imaging [2], biochemistry [3], neuroscience [4] and materials science [5]. A predominant tool in TDA is persistent homology, which tracks the evolution of the topological features in a multi-scale manner to avoid information loss [6, 7]. The multi-scale information is summarized by the persistence diagram (PD), a multiset of points in that encodes the lifetime (i.e., persistence) of the features.
More recently, researchers have started utilizing TDA for machine learning problems. Pachauri et al. [8] first computed a Gaussian kernel to estimate the density of points on a regular grid for each rasterized PD, and fed the discrete density estimation as a vector into an SVM classifier without any feature selection. However, their method did not establish the stability of the kernel-induced vector representation. Reininghaus et al. [1] then designed a stable multi-scale kernel for PDs motivated by scale-space theory as will be described in the next Section. Experiments on three benchmark data sets showed that this method greatly outperformed an alternative approach based on persistence landscape [9], a popular statistical treatment of TDA. Similar to this work, Kusano et al. [10] proposed a stable persistence weighted Gaussian kernel, allowing one to control the effect of persistence. However, the computational complexity of both the kernel-based methods for calculating the Gram matrix is if there are PDs for training and the PDs contain at most points, which can be quite expensive for many practical applications.
To enable large-scale computations with PDs, recent methods have mapped each PD to a stable vector to allow direct use of vector-based learning methods. For example, Adams et al. [11] constructed vectors by discretizing the weighted sum of probability distributions centered at each point in transformed PDs. Carrière et al. [12] rearranged the entries of the distance matrix between points in a PD and Bonis et al. [13] adopted a pooling scheme to construct the vectors.
Sparse optimized sampling of vectorized PDs can provide a further reduction for improved classifier training performance, by leveraging an initial low-rank feature transformation such as principal components analysis (PCA). In the context of image classification using linear discriminant analysis, Brunton et al. [14] use convex optimization to identify sparse pixel locations that map into the discriminating subspaces in PCA coordinates. Recent advances in model order reduction employ fast matrix pivoting schemes to sample PCA libraries for sparse classification of dynamical regimes in physical systems [15, 16].
In this work, we employ the vector representation from [11] and integrate with a sparse sampling method using QR pivots to identify discriminative features in the presence of noisy and redundant information to further improve classifier training time and sometimes prediction accuracy.
3 Sparse-TDA Method
We now introduce a vector representation of a PD, termed a persistence image (PI), presented in [11]. Since our Sparse-TDA method will combine PI-based TDA with sparse sample selection, we first summarize the sparse sampling method before describing the combination.
3.1 Optimized Sparse Sample Selection
Vectorized PIs sparsely encode topological structure within a few key pixel locations containing nonzero entries. Sampling these PIs at critical pixel locations is often sufficient for training downstream classifiers at a fraction of the runtime required for full PIs. To determine these PI indices, we use a pixel sampling method based on powerful low-rank matrix approximations. First, we arrange the PI vectors from all the training classes into columns of a matrix and compute its truncated singular value decomposition to obtain the dominant PI variation patterns (principal components)
[TABLE]
The SVD truncation parameter determines the number of pixel samples and is chosen according to the optimal singular value threshold [17]. We then discretely sample the PI principal components using the pivoted QR factorization, an efficient greedy alternative to expensive convex optimization methods. QR pivoting is the workhorse behind discrete sampling for underdetermined least squares problems [18], polynomial interpolation [19], and more recently, model order reduction [20] and sensor placement [21]. The pivoting procedure optimizes a row permutation of the principal components that is numerically well-conditioned by factoring into unitary and upper-triangular matrices and
[TABLE]
The final step converts a given PI, , into a sparsely sampled PI, , where the first permutation indices correspond to the selected pixel locations.
3.2 Combining Sparse Sample Selection with Persistence Images
Let be a training set of PDs. To construct a PI from a given PD [11], is first transformed from birth-death coordinates to birth-persistence coordinates. Let be the linear transformation,
[TABLE]
A persistence surface on is defined by
[TABLE]
where is a non-negative weighting function that is zero along the horizontal axis, continuous, and piecewise differentiable; is a probability function with mean and variance .
In our experiments, the linear weighting (LW) function is
[TABLE]
where . The form of the nonlinear weighting (NW) function is inspired by the weighting function used in [10] and chosen as
[TABLE]
where . We choose to be the Gaussian distribution, i.e.,
[TABLE]
where . Then the PI, a matrix of pixel values, is obtained by calculating the integral of on each grid box from discretization,
[TABLE]
PI has also been proven to be 1-Wasserstein stable. Assume that the number of desired features (i.e., pixel samples) is . Applying the sparse sampling method on , we obtain the row indices of optimal pixel locations and the sparsely sampled PIs for the downstream classifiers.
4 Results
We now discuss the performance of our Sparse-TDA method on three benchmark computer vision data sets. The data sets are explained first, followed by illustrations of the selected features, and quantitative comparisons of our method with the L1-SVM feature selection method using the same PIs and the multi-scale kernel TDA method. The illustrations and comparison results show the usefulness of the method on challenging multi-way classification problems.
4.1 Data Sets
For shape classification, SHREC’14 synthetic and real data sets are used, given in the format of triangulated 3D meshes [22]. The synthetic set contains meshes from five males, five females and five children in 20 different poses, while the real set consists of 20 males and 20 females in 10 different poses.
For texture recognition, we use the Outex_TC_00000 data set [23]. This data set contains 480 images equally categorized into 24 classes and provides 100 predefined 50/50 training/testing splits. During preprocessing, we downsample the original images to pixel images as done in the multi-scale kernel TDA method.
4.2 Feature Selection
We first follow the same procedure performed in the multi-scale kernel TDA method to obtain the PDs. For SHREC’14 data sets, we compute the heat kernel signature [24] on the surface mesh of each object and then compute the 1-dimensional PDs using Dipha111https://github.com/DIPHA/dipha. For the OuTeX data set, we take the sign component of the completed local binary pattern operator [25] as the descriptor function. Then we generate the 0-dimensional PDs from the filtration of its rotation-invariant version with neighbors and radius .
To generate the PIs, we set the grid resolution to be 30 30 for all three data sets. In fact, the classification accuracy is fairly robust to the choice of resolution [11]. We also set to be 0.2, 0.0001 and 0.02 for SHREC’14 synthetic, SHREC’14 real and OuTex data sets, respectively. Fig. 1 shows representative PIs for three different classes in all of our benchmark data sets. Noticeable differences are observed among the PIs for each of the three data sets, although the differences are most pronounced for the SHREC’14 synthetic data set, reasonably clear for the SHREC’14 synthetic data set, and less evident for the OuTeX data set. These differences in the pixel values of the PIs form the distinguishing class features from which an optimal set is selected by QR pivots. Fig. 2 measures the effect of varying the number of pixel samples determined by the SVD truncation parameter in Eq. (1) on classification accuracy and performance (see below for detailed settings). As expected, classifier training time increases with additional samples. The accuracy, however, improves until the number of samples equals the optimal SVD truncation parameter , after which limited additional information is available. Beyond this value, accuracy tapers off, which is consistent with the percentage of PI variance (energy) captured by the truncated SVD. For this reason, is selected as for our Sparse-TDA method in the following simulations. In the case of the L1-SVM method, a sparse solution is generated by L1 regularization during the training phase of a linear classifier. No feature selection is involved for the kernel TDA method.
4.3 Classification Performance
We feed the reduced feature vectors for training into a soft margin C-SVM classifier with a radial basis function (RBF) kernel, implemented in LIBSVM [26], for each data set. The cost factor and kernel parameter are tuned based on a grid search using 10-fold cross-validation on the training data. We start a coarse grid search with exponentially growing sequences of and first, thereafter proceeding with finer grid searches in the vicinity of the optimal region yielded by the previous grid search. Each grid search includes a total of 50 pairs of values which are used to apply the training model to the sparsely sampled PIs of the test set. For the L1-SVM method, since only the cost factor needs to be trained, it is then tuned 10 times using the same scheme as described above with the implementation in LIBLINEAR [27]. Results are reported based on 30 runs for each case with the exception of those presented for the OuTex data set in Tables I-II, which are based on 100 runs.
Table I compares the classification accuracy of both the variants of the L1-SVM and our Sparse-TDA method with the multi-scale kernel TDA method. Note that the two SHREC’14 data sets are partitioned into 70/30 training/testing samples, whereas the OuTeX set is partitioned into 50/50 training/testing samples. The number of samples for each class is approximately the same in all the training sets. Consistent with the differences observed in the PIs among the classes, both the variants of our method perform slightly better than the kernel TDA method for the SHREC’14 real data set. On the other hand, our method is marginally worse than the kernel TDA method for both the SHREC’14 synthetic and OuTeX data sets, even though the accuracy increases slightly using nonlinear weighting. Both the L1-SVM variants are, however, inferior to the other methods by varying degrees for all the data sets.
Table II provides a comparison of the three methods in terms of the SVM-based classifier training time as measured on a laptop with a 2.4GHz Intel Core i5 CPU and 4 GB RAM. In the case of the L1-SVM and Sparse-TDA methods, the training time starts from the computation of the PIs. Not surprisingly, both our method variants are usually much faster than kernel TDA as they use smaller sets of selected features. As expected, the reduction in training time is greater with linear weighting than with nonlinear weighting. In fact, the Sparse-TDA method with linear weighting achieves about 46X speed-up for the SHREC’14 synthetic data set and roughly 45X speed-up for the OuTeX data set. However, there is no consistent speed-up for the SHREC’14 real data set owing to the fact that there are only 4-5 points in each PD, rendering the training of the kernel TDA method exceptionally fast. In contrast, there are 38-294 points and 127-299 points in each PD for the SHREC’14 synthetic and OuTeX data sets, respectively. On the other hand, the L1-SVM method is not consistently fast due to the non-differentiability of the L1-regularized form, which leads to more difficulties in solving the optimization problem during training. For example, the training time of the L1-SVM method with linear weighting is more than three times as much as that of our counterpart method.
Fig. 3 shows the trends in improving the classification accuracy and reducing the classifier training time, respectively, as a function of increasing training/testing split for all the benchmark data sets. Consistent with the results reported in Table I, our classification accuracy is marginally inferior to that of the kernel TDA method for the SHREC’14 synthetic and OuTeX data sets. However, both our method variants marginally outperform the kernel method for the most challenging SHREC’14 real data set. The training time trends are also very similar to the results presented earlier in Table II, with more than an order of magnitude reduction for the SHREC’14 synthetic and the OuTeX data sets, and comparable values for the SHREC’14 real set. The increase in classifier training times with higher training/test splits is, however, slightly more for both our method variants as compared to the kernel TDA method due to the selection of more pixel samples as training size increases. Overall, we observe that for each of the benchmark data sets, at least one of our Sparse-TDA variants outperforms the kernel TDA method either in terms of classification accuracy or classifier training time. Moreover, Sparse-TDA outperforms both the L1-SVM variants in terms of classification accuracy for all the data sets, and achieves comparable computation time for the SHREC’14 synthetic and the OuTeX data sets, and a substantial reduction for the SHREC’14 real set.
5 Conclusions
In this paper, we present a new method, referred as the Sparse-TDA algorithm, that provides a sparse realization of a TDA algorithm. More specifically, we combine optimized sparse sampling based on pivoted QR factorization with a state-of-the-art TDA method. Instead of persistence diagrams, we use a vector-based representation of persistent homology, called persistence images, with two different weighting functions to extract the topological features.
The results are promising on three benchmark multi-way classification problems pertaining to 3D meshes of human posture recognition, both for real and synthetic shapes, and image texture detection. Our method gives similar classification accuracy and substantial reduction in training times as compared to a kernel TDA method that was earlier evaluated on these data sets. It also provides better accuracy and similar training times as compared to popular SVM classifiers. Such performance is, therefore, expected to lay the foundation for online adaptation of TDA on challenging data sets with a large number of classes in response to changes in the availability of training samples.
In the future, we would like to further improve the accuracy of the Sparse-TDA method by designing our own weighting function for the persistence images. We would also like to come up with theoretical performance guarantees based on the characteristics of the data sets, particularly the training sample size for each individual class. Last but not the least, we plan to show the effectiveness of our method on other hard classification problems arising in robot visual perception and human face recognition.
Acknowledgments
We would like to thank The Boeing Company for sponsoring this work in part under contract # SSOW-BRT-W0714-0004 and Dr. Tom Hogan for helpful discussions. The views and opinions expressed in the paper are, however, solely of the authors and do not necessarily reflect those of the sponsor. We also would like to thank the anonymous reviewers for their constructive comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Reininghaus et al. , “A stable multi-scale kernel for topological machine learning,” in Proc. IEEE Conf. Comp. Vis. Pattern Recog. (CVPR’15) , Boston, MA, June 2015, pp. 4741–4748.
- 2[2] M. Gao et al. , “Segmenting the papillary muscles and the trabeculae from high resolution cardiac CT through restoration of topological handles,” in Int. Conf. Info. Process. in Medical Imaging (IPMI’13) . Springer, 2013, pp. 184–195.
- 3[3] M. Gameiro et al. , “A topological measurement of protein compressibility,” Japan J. Ind. and Appl. Math. , vol. 32, no. 1, pp. 1–17, 2015.
- 4[4] M. K. Chung et al. , “Persistence diagrams of cortical surface data,” in Int. Conf. on Info. Process. in Medical Imaging (IPMI’09) , 2009, pp. 386–397.
- 5[5] Y. Hiraoka et al. , “Hierarchical structures of amorphous solids characterized by persistent homology,” Proc. Nat. Acad. of Sci. , vol. 113, no. 26, pp. 7035–7040, 2016.
- 6[6] H. Edelsbrunner et al. , “Topological persistence and simplification,” Discrete Comput. Geom. , vol. 28, no. 4, pp. 511–533, 2002.
- 7[7] A. Zomorodian and G. Carlsson, “Computing persistent homology,” Discrete Comput. Geom. , vol. 33, no. 2, pp. 249–274, 2005.
- 8[8] D. Pachauri et al. , “Topology-based kernels with application to inference problems in Alzheimer’s disease,” IEEE Trans. Med. Imag. , vol. 30, no. 10, pp. 1760–1770, 2011.
