Predicting Intermediate Storage Performance for Workflow Applications
Lauro Beltr\~ao Costa, Abmar Barros, Samer Al-Kiswany, Hao, Yang, Emalayan Vairavanathan, Matei Ripeanu

TL;DR
This paper introduces a performance prediction mechanism for storage systems tailored to workflow applications, enabling efficient configuration tuning by estimating application turnaround times with high accuracy and significant speedup.
Contribution
We designed an end-to-end prediction system that models storage performance at data and control levels, supporting various optimizations and configurations for workflow applications.
Findings
Achieves over 200x speedup in predicting application performance.
Provides accurate configuration recommendations for storage systems.
Scales to model entire cluster workflows effectively.
Abstract
Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level. This paper presents our experience to date with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
