Components and Interfaces of a Process Management System for Parallel Programs
Ralph Butler, William Gropp, Ewing Lusk

TL;DR
This paper introduces MPD, a scalable process management system designed for parallel programs like MPI, enabling faster startup, improved runtime control, and better integration with parallel debugging and utilities.
Contribution
The paper presents MPD, a scalable, efficient process management system for parallel jobs, with a flexible interface separating process management from parallel libraries.
Findings
Faster startup times for large parallel jobs.
Enhanced control over stdio for parallel processes.
Support for parallel debugging and utilities.
Abstract
Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
