Quantifying uncertainty in spectral clusterings: expectations for perturbed and incomplete data
J\"urgen D\"olz, Jolanda Weygandt

TL;DR
This paper develops a mathematical framework using random set theory to quantify and analyze the uncertainty in spectral clustering results caused by data corruption, measurement errors, and incompleteness.
Contribution
It introduces a novel approach for estimating expected clusterings under data uncertainties and analyzes their consistency as data and Monte Carlo samples grow large.
Findings
Proposes Monte Carlo methods for uncertainty quantification in spectral clustering.
Analyzes the consistency of uncertainty measures with increasing data and samples.
Provides numerical experiments demonstrating the effectiveness of the proposed framework.
Abstract
Spectral clustering is a popular unsupervised learning technique which is able to partition unlabelled data into disjoint clusters of distinct shapes. However, the data under consideration are often experimental data, implying that the data is subject to measurement errors and measurements may even be lost or invalid. These uncertainties in the corrupted input data induce corresponding uncertainties in the resulting clusters, and the clusterings thus become unreliable. Modelling the uncertainties as random processes, we discuss a mathematical framework based on random set theory for the computational Monte Carlo approximation of statistically expected clusterings in case of corrupted, i.e., perturbed, incomplete, and possibly even additional, data. We propose several computationally accessible quantities of interest and analyze their consistency in the infinite data point and infinite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
