KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis
Obi L. Griffith, Byron J. Gao, Mikhail Bilenky, Yuliya Prichyna,, Martin Ester, Steven J.M. Jones

TL;DR
KiWi is a scalable subspace clustering algorithm for gene expression data that efficiently discovers biologically relevant gene groups, including small twig clusters, and handles large datasets with improved computational performance.
Contribution
KiWi introduces a scalable OPSM-based subspace clustering method capable of identifying small, tightly co-regulated gene groups in large gene expression datasets.
Findings
Correctly groups redundant probes and experiments with clinical annotations
Differentiates real promoter sequences from controls
Shows strong association with cis-regulatory motifs
Abstract
Subspace clustering has gained increasing popularity in the analysis of gene expression data. Among subspace cluster models, the recently introduced order-preserving sub-matrix (OPSM) has demonstrated high promise. An OPSM, essentially a pattern-based subspace cluster, is a subset of rows and columns in a data matrix for which all the rows induce the same linear ordering of columns. Existing OPSM discovery methods do not scale well to increasingly large expression datasets. In particular, twig clusters having few genes and many experiments incur explosive computational costs and are completely pruned off by existing methods. However, it is of particular interest to determine small groups of genes that are tightly coregulated across many conditions. In this paper, we present KiWi, an OPSM subspace clustering algorithm that is scalable to massive datasets, capable of discovering twig…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
