Helix: Holistic Optimization for Accelerating Iterative Machine Learning
Doris Xin, Stephen Macke, Litian Ma, Jialin Liu, Shuchen Song, Aditya, Parameswaran

TL;DR
Helix is a system that optimizes iterative machine learning workflows by intelligently caching and recomputing intermediates, significantly reducing runtime compared to existing systems across diverse applications.
Contribution
Helix introduces a holistic optimization approach for iterative ML workflows, combining a Scala DSL with algorithms for caching and reuse, addressing a gap in current ML systems.
Findings
Achieves up to 19x speedup over state-of-the-art systems
Handles diverse ML workflows within a unified framework
Effective heuristics for NP-hard caching problem
Abstract
Machine learning workflow development is a process of trial-and-error: developers iterate on workflows by testing out small modifications until the desired accuracy is achieved. Unfortunately, existing machine learning systems focus narrowly on model training---a small fraction of the overall development time---and neglect to address iterative development. We propose Helix, a machine learning system that optimizes the execution across iterations---intelligently caching and reusing, or recomputing intermediates as appropriate. Helix captures a wide variety of application needs within its Scala DSL, with succinct syntax defining unified processes for data preprocessing, model specification, and learning. We demonstrate that the reuse problem can be cast as a Max-Flow problem, while the caching problem is NP-Hard. We develop effective lightweight heuristics for the latter. Empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
