Algorithm for Finding Optimal Gene Sets in Microarray Prediction
J.M. Deutsch

TL;DR
This paper introduces a replication algorithm that identifies minimal gene sets for accurate cancer classification using microarray data, reducing the number of genes needed while maintaining perfect classification accuracy.
Contribution
The paper presents a novel replication algorithm that evolves ensembles of predictors to find optimal gene sets for cancer diagnosis, demonstrating significant gene reduction.
Findings
Reduced gene set from 96 to 15 for childhood cancers
Achieved perfect classification on test data
Validated method on leukemia and childhood cancer datasets
Abstract
Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Genetics, Bioinformatics, and Biomedical Research
