Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines
Chaokun Chang, Eric Lo, Chunxiao Ye

TL;DR
Biathlon is a new machine learning serving system that exploits model resilience to approximate input features, significantly accelerating inference pipelines while maintaining accuracy within guaranteed bounds.
Contribution
It introduces a novel approach to optimize ML inference speed by leveraging model resilience and adaptive approximation of input features.
Findings
Achieves 5.3x to 16.6x speedup in real pipelines
Maintains near-original accuracy with approximation
Demonstrates effectiveness on industry and competition datasets
Abstract
Machine learning inference pipelines commonly encountered in data science and industries often require real-time responsiveness due to their user-facing nature. However, meeting this requirement becomes particularly challenging when certain input features require aggregating a large volume of data online. Recent literature on interpretable machine learning reveals that most machine learning models exhibit a notable degree of resilience to variations in input. This suggests that machine learning models can effectively accommodate approximate input features with minimal discernible impact on accuracy. In this paper, we introduce Biathlon, a novel ML serving system that leverages the inherent resilience of models and determines the optimal degree of approximation for each aggregation feature. This approach enables maximum speedup while ensuring a guaranteed bound on accuracy loss. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Topic Modeling · Explainable Artificial Intelligence (XAI)
