BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic   Guarantees

Yongjoo Park; Jingyi Qing; Xiaoyang Shen; Barzan Mozafari

arXiv:1812.10564·cs.LG·December 31, 2018

BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

Yongjoo Park, Jingyi Qing, Xiaoyang Shen, Barzan Mozafari

PDF

TL;DR

BlinkML is a system that enables fast, approximate machine learning model training with probabilistic guarantees on prediction accuracy, significantly reducing training time for large datasets while maintaining model quality.

Contribution

It introduces a novel approach for training ML models on samples with probabilistic guarantees, applicable to models based on maximum likelihood estimation.

Findings

01

Speeds up training by up to 629x on large datasets

02

Provides high-probability guarantees on prediction consistency

03

Supports a wide range of MLE-based models

Abstract

The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows…

Figures1

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings