Clustering through Feature Space Sequence Discovery and Analysis

Shi Guobin

arXiv:2212.00996·cs.LG·December 5, 2022

Clustering through Feature Space Sequence Discovery and Analysis

Shi Guobin

PDF

Open Access

TL;DR

This paper introduces DCSA, a parameter-free clustering algorithm that explores high-dimensional data by converting it into sequences and identifying clusters through path analysis, demonstrating robustness and interpretability.

Contribution

The paper presents a novel, simple, and efficient sequence-based clustering method that does not require prior parameters and works effectively on high-dimensional data.

Findings

01

Robust clustering across various real-world datasets

02

Effective in high-dimensional spaces up to 20,531 dimensions

03

Provides visually interpretable results

Abstract

Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms