A Graph-based Approach to Estimating the Number of Clusters in High-dimensional Settings

Yichuan Bai; Lynna Chu

arXiv:2402.15600·stat.ME·June 13, 2025·1 cites

A Graph-based Approach to Estimating the Number of Clusters in High-dimensional Settings

Yichuan Bai, Lynna Chu

PDF

Open Access

TL;DR

This paper introduces a graph-based, non-parametric method for accurately estimating the number of clusters in high-dimensional datasets, demonstrating superior performance over existing methods through simulations and real data applications.

Contribution

The paper presents a novel graph-based statistic for estimating cluster numbers that is dimension-agnostic, computationally efficient, and theoretically consistent.

Findings

01

Outperforms existing methods in high-dimensional simulations

02

Effective on imaging and RNA-seq datasets

03

Provides asymptotic consistency proof

Abstract

We consider the problem of estimating the number of clusters (k) in a dataset. We propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with any kind of clustering technique. Asymptotic theory is developed to establish the selection consistency of the proposed approach. Simulation studies demonstrate that the graph-based statistic outperforms existing methods for estimating k, especially in the high-dimensional setting. We illustrate its utility on an imaging dataset and an RNA-seq dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Graph Labeling and Dimension Problems · Graph theory and applications