Low-level I/O Monitoring for Scientific Workflows
Joel Witzke, Ansgar L\"o{\ss}er, Vasilis Bountris, Florian Schintke,, Bj\"orn Scheuermann

TL;DR
This paper presents a method to correlate low-level I/O resource usage data with high-level scientific workflow tasks in distributed environments, enabling identification of bottlenecks and optimization opportunities.
Contribution
It introduces a technique to associate low-level resource traces with high-level workflow tasks using metadata, improving I/O analysis accuracy in distributed scientific workflows.
Findings
Effective correlation of resource traces with workflow tasks.
Identification of bottlenecks in I/O behavior.
Enhanced workflow optimization potential.
Abstract
While detailed resource usage monitoring is possible on the low-level using proper tools, associating such usage with higher-level abstractions in the application layer that actually cause the resource usage in the first place presents a number of challenges. Suppose a large-scale scientific data analysis workflow is run using a distributed execution environment such as a compute cluster or cloud environment and we want to analyze the I/O behaviour of it to find and alleviate potential bottlenecks. Different tasks of the workflow can be assigned to arbitrary compute nodes and may even share the same compute nodes. Thus, locally observed resource usage is not directly associated with the individual workflow tasks. By acquiring resource usage profiles of the involved nodes, we seek to correlate the trace data to the workflow and its individual tasks. To accomplish that, we select the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
