Binary Bleed: Fast Distributed and Parallel Method for Automatic Model Selection
Ryan Barron (1, 3), Maksim E. Eren (2, 3), Manish Bhattarai (1),, Ismael Boureima (1), Cynthia Matuszek (2, 3), Boian S. Alexandrov (1) ((1), Theoretical Division, Los Alamos National Laboratory, Los Alamos, USA, (2), Advanced Research in Cyber Systems

TL;DR
Binary Bleed is a fast, heuristic-based binary search method that efficiently determines the optimal number of clusters or components in machine learning models, reducing computational effort while maintaining accuracy.
Contribution
The paper introduces Binary Bleed, a novel binary search approach that significantly reduces the search space for hyper-parameter k in clustering and dimensionality reduction algorithms.
Findings
Reduces search space for k in ML models
Accurately identifies optimal k with less computation
Works with distributed and serial computing environments
Abstract
In several Machine Learning (ML) clustering and dimensionality reduction approaches, such as non-negative matrix factorization (NMF), RESCAL, and K-Means clustering, users must select a hyper-parameter k to define the number of clusters or components that yield an ideal separation of samples or clean clusters. This selection, while difficult, is crucial to avoid overfitting or underfitting the data. Several ML applications use scoring methods (e.g., Silhouette and Davies Boulding scores) to evaluate the cluster pattern stability for a specific k. The score is calculated for different trials over a range of k, and the ideal k is heuristically selected as the value before the model starts overfitting, indicated by a drop or increase in the score resembling an elbow curve plot. While the grid-search method can be used to accurately find a good k value, visiting a range of k can become…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsRESCAL
