*K-means and Cluster Models for Cancer Signatures

Zura Kakushadze; Willie Yu

arXiv:1703.00703·q-bio.GN·October 5, 2017·1 cites

*K-means and Cluster Models for Cancer Signatures

Zura Kakushadze, Willie Yu

PDF

Open Access

TL;DR

This paper introduces a deterministic K-means clustering method applied to cancer genome data, revealing distinct cancer signatures and shared structures among certain cancer types, with potential applications beyond biology.

Contribution

It presents a novel, computationally efficient K-means approach for extracting cancer signatures without NMF, demonstrating its effectiveness on large genome datasets.

Findings

01

Identified three cancer types without cluster structures.

02

Found two clusters with high correlations indicating common features.

03

K-means outperforms NMF in computational efficiency.

Abstract

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1,389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics