D2O - a distributed data object for parallel high-performance computing in Python
T. Steininger, M. Greiner, F. Beaujean, T. En{\ss}lin

TL;DR
D2O is a Python module that provides a user-friendly, scalable, and high-performance way to handle distributed multi-dimensional arrays in high-performance computing environments, closely mimicking numpy.
Contribution
It introduces a portable, easy-to-use Python library that abstracts data distribution for parallel computing without sacrificing performance.
Findings
D2O achieves numpy-like performance in serial applications.
It scales efficiently on MPI clusters.
The library is open-source and easy to modify.
Abstract
We introduce D2O, a Python module for cluster-distributed multi-dimensional numerical arrays. It acts as a layer of abstraction between the algorithm code and the data-distribution logic. The main goal is to achieve usability without losing numerical performance and scalability. D2O's global interface is similar to the one of a numpy.ndarray, whereas the cluster node's local data is directly accessible for use in customized high-performance modules. D2O is written in pure Python which makes it portable and easy to use and modify. Expensive operations are carried out by dedicated external libraries like numpy and mpi4py. The performance of D2O is on a par with numpy for serial applications and scales well when moving to an MPI cluster. D2O is open-source software available under the GNU General Public License v3 (GPL-3) at https://gitlab.mpcdf.mpg.de/ift/D2O
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
