ExpoCloud: a Framework for Time and Budget-Effective Parameter Space Explorations Using a Cloud Compute Engine
Meir Goldenberg

TL;DR
ExpoCloud is a framework that leverages cloud computing to efficiently and cost-effectively explore large parameter spaces through high concurrency, dynamic resource management, and fault tolerance.
Contribution
It introduces a flexible cloud-based framework for large-scale parameter exploration that optimizes time and budget, with implementations for Google Cloud and local simulation.
Findings
Enables high concurrency in parameter exploration
Reduces costs by terminating unnecessary instances
Supports fault tolerance for large experiments
Abstract
Large parameter space explorations are among the most time consuming yet critically important tasks in many fields of modern research. ExpoCloud enables the researcher to harness cloud compute resources to achieve time and budget-effective large-scale concurrent parameter space explorations. ExpoCloud enables maximal possible levels of concurrency by creating compute instances on-the-fly, saves money by terminating unneeded instances, provides a mechanism for saving both time and money by avoiding the exploration of parameter settings that are as hard or harder than the parameter settings whose exploration timed out. Effective fault tolerance mechanisms make ExpoCloud suitable for large experiments. ExpoCloud provides an interface that allows its use under various cloud environments. As a proof of concept, we implemented a class supporting the Google Compute Engine (GCE). We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Distributed systems and fault tolerance
