Performance modeling of a distributed file-system
Sandeep Kumar

TL;DR
This paper develops a multiple linear regression model to predict and analyze the performance of general-purpose distributed file-systems based on various configuration and workload features, aiding in better tuning and understanding.
Contribution
It introduces a regression-based performance modeling approach for distributed file-systems that accounts for multiple configuration parameters and workload variations.
Findings
The model accurately predicts file-system performance based on features.
Feature importance ranking helps identify key performance influencers.
The approach supports better tuning for diverse workloads.
Abstract
Data centers have become center of big data processing. Most programs running in a data center processes big data. The storage requirements of such programs cannot be fulfilled by a single node in the data center, and hence a distributed file system is used where the the storage resource are pooled together from more than one node and presents a unified view of it to outside world. Optimum performance of these distributed file-systems given a workload is of paramount important as disk being the slowest component in the framework. Owning to this fact, many big data processing frameworks implement their own file-system to get the optimal performance by fine tuning it for their specific workloads. However, fine-tuning a file system for a particular workload results in poor performance for workloads that do not match the profile of desired workload. Hence, these file systems cannot be used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
