# Ultra-Scalable Spectral Clustering and Ensemble Clustering

**Authors:** Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jian-Huang Lai, Chee-Keong, Kwoh

arXiv: 1903.01057 · 2019-03-06

## TL;DR

This paper introduces ultra-scalable spectral and ensemble clustering algorithms designed for extremely large datasets, achieving high efficiency and robustness with nearly linear complexity, suitable for resource-limited environments.

## Contribution

The paper presents two novel algorithms, U-SPEC and U-SENC, that significantly improve scalability and robustness of spectral clustering for large-scale data.

## Key findings

- Capable of clustering ten-million-level datasets on standard PCs
- Nearly linear time and space complexity achieved
- Demonstrated robustness and scalability on various large datasets

## Abstract

This paper focuses on scalability and robustness of spectral clustering for extremely large-scale datasets with limited resources. Two novel algorithms are proposed, namely, ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative selection strategy and a fast approximation method for K-nearest representatives are proposed for the construction of a sparse affinity sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of U-SPEC while maintaining high efficiency. Based on the ensemble generation via multiple U-SEPC's, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning ten-million-level nonlinearly-separable datasets on a PC with 64GB memory. Experiments on various large-scale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://www.researchgate.net/publication/330760669.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.01057/full.md

## Figures

32 figures with captions in the complete paper: https://tomesphere.com/paper/1903.01057/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1903.01057/full.md

---
Source: https://tomesphere.com/paper/1903.01057