# Progressive Sampling-Based Bayesian Optimization for Efficient and   Automatic Machine Learning Model Selection

**Authors:** Xueqiang Zeng, Gang Luo

arXiv: 1812.02855 · 2018-12-10

## TL;DR

This paper introduces a progressive sampling-based Bayesian optimization method that efficiently automates machine learning algorithm and hyper-parameter selection, significantly reducing search time and error rates especially on large clinical datasets.

## Contribution

The paper proposes a novel progressive sampling-based Bayesian optimization approach that improves efficiency and effectiveness in automatic machine learning model selection.

## Key findings

- Reduces search time compared to existing methods
- Lowers classification error rates
- Decreases variability in error due to randomization

## Abstract

Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. Methods: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. Results: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. Conclusions: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

---
Source: https://tomesphere.com/paper/1812.02855