A Study of Feature Selection and Extraction Algorithms for Cancer   Subtype Prediction

Vaibhav Sinha; Siladitya Dash; Nazma Naskar; and Sk Md Mosaddek; Hossain

arXiv:2109.14648·cs.LG·October 1, 2021

A Study of Feature Selection and Extraction Algorithms for Cancer Subtype Prediction

Vaibhav Sinha, Siladitya Dash, Nazma Naskar, and Sk Md Mosaddek, Hossain

PDF

Open Access

TL;DR

This paper evaluates feature selection algorithms for cancer subtype prediction, demonstrating that sequential application reduces computational costs and can enhance model performance in high-dimensional omics data.

Contribution

It introduces a sequential approach to feature selection that improves efficiency and predictive accuracy for cancer subtype classification.

Findings

01

Sequential feature selection reduces computational cost.

02

Dimension reduction can improve model accuracy.

03

Analysis supports the effectiveness of the proposed methods.

Abstract

In this work, we study and analyze different feature selection algorithms that can be used to classify cancer subtypes in case of highly varying high-dimensional data. We apply three different feature selection methods on five different types of cancers having two separate omics each. We show that the existing feature selection methods are computationally expensive when applied individually. Instead, we apply these algorithms sequentially which helps in lowering the computational cost and improving the predictive performance. We further show that reducing the number of features using some dimension reduction techniques can improve the performance of machine learning models in some cases. We support our findings through comprehensive data analysis and visualization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks

MethodsFeature Selection