Data-driven Advice for Applying Machine Learning to Bioinformatics   Problems

Randal S. Olson; William La Cava; Zairah Mustahsan; Akshay Varik,; Jason H. Moore

arXiv:1708.05070·q-bio.QM·January 9, 2018

Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

Randal S. Olson, William La Cava, Zairah Mustahsan, Akshay Varik,, Jason H. Moore

PDF

2 Repos

TL;DR

This paper analyzes 13 machine learning algorithms across 165 bioinformatics classification problems to provide data-driven recommendations and guidelines for effective algorithm selection and tuning.

Contribution

It offers a comprehensive comparison of algorithms and presents practical recommendations for applying machine learning in bioinformatics.

Findings

01

Five algorithms with optimal hyperparameters identified

02

Statistical and visual performance comparisons provided

03

Guidelines for model selection and tuning in bioinformatics

Abstract

As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.