TL;DR
This paper introduces ConEx, a novel method combining EMCMC sampling and cost reduction techniques to efficiently explore big-data system configurations, significantly improving performance over existing approaches.
Contribution
The paper presents a new cost-effective approach using EMCMC and data-driven proxies to optimize big-data system configurations at scale.
Findings
Outperforms random sampling, genetic algorithms, and predictive models.
Uses scaled-up jobs as proxies for larger configurations.
Employs dynamic job similarity for transferability of results.
Abstract
Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of the high cost of collecting training data. We present a new method based on a combination of Evolutionary Markov Chain Monte Carlo (EMCMC) sampling and cost reduction techniques to cost-effectively find better-performing configurations for big data systems. For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
