One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers
Matthew Halpern, Behzad Boroujerdian, Todd Mummert, Evelyn, Duesterwald, Vijay Janapa Reddi

TL;DR
This paper proposes Tolerance Tiers for MLaaS cloud services, allowing users to select accuracy and latency levels suited to their needs, thereby improving efficiency over traditional uniform service deployment.
Contribution
It introduces Tolerance Tiers as a novel approach to customize MLaaS performance, addressing the limitations of the one-size-fits-all deployment strategy.
Findings
Tolerance Tiers improve service efficiency and user satisfaction.
The approach outperforms traditional uniform deployment in accuracy-latency trade-offs.
Applicable to speech recognition and image classification systems.
Abstract
Today's cloud service architectures follow a "one size fits all" deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness requirements, which as we demonstrate renders the "one size fits all" approach inefficient in practice. We use a production-grade speech recognition engine, which serves several thousands of users, and an open source computer vision based system, to explain our point. To overcome the limitations of the "one size fits all" approach, we recommend Tolerance Tiers where each MLaaS tier exposes an accuracy/responsiveness characteristic, and consumers can programmatically select a tier. We evaluate our proposal on the CPU-based automatic speech recognition (ASR) engine and cutting-edge neural networks for image classification deployed on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
