Clustering through Feature Space Sequence Discovery and Analysis
Shi Guobin

TL;DR
This paper introduces DCSA, a parameter-free clustering algorithm that explores high-dimensional data by converting it into sequences and identifying clusters through path analysis, demonstrating robustness and interpretability.
Contribution
The paper presents a novel, simple, and efficient sequence-based clustering method that does not require prior parameters and works effectively on high-dimensional data.
Findings
Robust clustering across various real-world datasets
Effective in high-dimensional spaces up to 20,531 dimensions
Provides visually interpretable results
Abstract
Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms
