TL;DR
This paper analyzes 13 machine learning algorithms across 165 bioinformatics classification problems to provide data-driven recommendations and guidelines for effective algorithm selection and tuning.
Contribution
It offers a comprehensive comparison of algorithms and presents practical recommendations for applying machine learning in bioinformatics.
Findings
Five algorithms with optimal hyperparameters identified
Statistical and visual performance comparisons provided
Guidelines for model selection and tuning in bioinformatics
Abstract
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
