Toward Scalable Machine Learning and Data Mining: the Bioinformatics   Case

Faraz Faghri; Sayed Hadi Hashemi; Mohammad Babaeizadeh; Mike A. Nalls,; Saurabh Sinha; Roy H. Campbell

arXiv:1710.00112·cs.DC·October 3, 2017·5 cites

Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case

Faraz Faghri, Sayed Hadi Hashemi, Mohammad Babaeizadeh, Mike A. Nalls,, Saurabh Sinha, Roy H. Campbell

PDF

Open Access

TL;DR

This paper reviews key machine learning and data mining algorithms used in bioinformatics to guide scalable computing efforts for handling big biological data efficiently.

Contribution

It identifies influential algorithms in bioinformatics and suggests focusing on scalable computing solutions for these to address big data challenges.

Findings

01

Highlights top algorithms in classification, clustering, regression, and dimensionality reduction.

02

Emphasizes the need for scalable storage and computation in bioinformatics.

03

Guides future research directions in scalable bioinformatics algorithms.

Abstract

In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of "optimize the common case".

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Gene expression and cancer classification · Genetics, Bioinformatics, and Biomedical Research