*K-means and Cluster Models for Cancer Signatures
Zura Kakushadze, Willie Yu

TL;DR
This paper introduces a deterministic K-means clustering method applied to cancer genome data, revealing distinct cancer signatures and shared structures among certain cancer types, with potential applications beyond biology.
Contribution
It presents a novel, computationally efficient K-means approach for extracting cancer signatures without NMF, demonstrating its effectiveness on large genome datasets.
Findings
Identified three cancer types without cluster structures.
Found two clusters with high correlations indicating common features.
K-means outperforms NMF in computational efficiency.
Abstract
We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1,389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
