How complex is the microarray dataset? A novel data complexity metric for biological high-dimensional microarray data
Zhendong Sha, Li Zhu, Zijun Jiang, Yuanzhu Chen, Ting Hu

TL;DR
This paper introduces a new data complexity measure called depth, designed to better evaluate the complexity of biological microarray datasets by being robust to irrelevant features and capturing feature interactions.
Contribution
The paper proposes a novel complexity metric, depth, that improves robustness and provides new insights into microarray data complexity analysis.
Findings
Depth outperforms existing complexity measures on synthetic data.
A single gene feature can explain over 90% of model performance.
Genotype data is more complex to model than gene-expression data.
Abstract
Data complexity analysis quantifies the hardness of constructing a predictive model on a given dataset. However, the effectiveness of existing data complexity measures can be challenged by the existence of irrelevant features and feature interactions in biological micro-array data. We propose a novel data complexity measure, depth, that leverages an evolutionary inspired feature selection algorithm to quantify the complexity of micro-array data. By examining feature subsets of varying sizes, the approach offers a novel perspective on data complexity analysis. Unlike traditional metrics, depth is robust to irrelevant features and effectively captures complexity stemming from feature interactions. On synthetic micro-array data, depth outperforms existing methods in robustness to irrelevant features and identifying complexity from feature interactions. Applied to case-control genotype and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Bioinformatics and Genomic Networks
