A New Model for Massively Parallel Computation Considering both Communication and IO Cost
Hengzhao Ma, Xiangyu Gao, Jianzhong Li, Tianpeng Gao

TL;DR
This paper introduces a novel parallel computation model that incorporates both communication and IO costs, addressing the limitations of previous models that neglected IO, especially relevant for big data processing.
Contribution
It is the first to integrate IO cost into a parallel computation model, proposing new problems, proving their hardness, and designing approximation algorithms.
Findings
The new model effectively captures IO and communication costs in parallel computation.
Proved the computational hardness of minimizing IO and communication costs.
Developed approximation algorithms with proven performance bounds.
Abstract
In the research area of parallel computation, the communication cost has been extensively studied, while the IO cost has been neglected. For big data computation, the assumption that the data fits in main memory no longer holds, and external memory must be used. Therefore, it is necessary to bring the IO cost into the parallel computation model. In this paper, we propose the first parallel computation model which takes IO cost as well as non-uniform communication cost into consideration. Based on the new model, we raise several new problems which aim to minimize the IO and communication cost on the new model. We prove the hardness of these new problems, then design and analyze the approximate algorithms for solving them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
