The efficacy of various machine learning models for multi-class   classification of RNA-seq expression data

Sterling Ramroach; Melford John; and Ajay Joshi

arXiv:1908.06817·cs.LG·August 20, 2019

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Sterling Ramroach, Melford John, and Ajay Joshi

PDF

TL;DR

This study evaluates five machine learning models for multi-class cancer classification using RNA-seq data, finding ensemble methods achieve near-perfect accuracy across most cancer types, even with reduced gene features.

Contribution

It systematically compares multiple machine learning algorithms for RNA-seq based cancer classification, highlighting the superior performance of ensemble methods.

Findings

01

Ensemble algorithms achieve 100% accuracy on 14 of 17 cancer types.

02

Feature reduction to 20 genes maintains over 95% accuracy with ensembles.

03

Clustering and classification models perform poorly due to dataset noise.

Abstract

Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.