Modeling the Linux page cache for accurate simulation of data-intensive applications
Hoang-Dung Do, Valerie Hayot-Sasson, Rafael Ferreira da Silva,, Christopher Steele, Henri Casanova, Tristan Glatard

TL;DR
This paper introduces a detailed Linux page cache simulation model integrated into WRENCH, significantly improving the accuracy of data-intensive application performance predictions in simulation environments.
Contribution
The authors develop and implement a comprehensive Linux page cache model within WRENCH, enabling more accurate simulation of I/O behavior for data-intensive applications.
Findings
Model reduces simulation error by up to tenfold.
Accurately simulates both single-threaded and multithreaded applications.
Effective for local and remote I/O scenarios.
Abstract
The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results. In this paper, we propose an I/O simulation model that includes the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Cloud Computing and Resource Management
