Python Implementation of the Dynamic Distributed Dimensional Data Model
Hayden Jananthan, Lauren Milechin, Michael Jones, William Arcand,, William Bergeron, David Bestor, Chansup Byun, Michael Houle, Matthew Hubbell,, Vijay Gadepally, Anna Klein, Peter Michaleas, Guillermo Morales, Julie, Mullen, Andrew Prout, Albert Reuther, Antonio Rosa

TL;DR
This paper introduces a Python implementation of the Dynamic Distributed Dimensional Data Model (D4M), enabling efficient big data handling with support for databases like Accumulo and SQL, and compares its performance to existing D4M versions.
Contribution
The paper presents $D4M.py$, a comprehensive Python implementation of D4M with database support and performance benchmarking against MATLAB and Julia versions.
Findings
$D4M.py$ performs comparably to D4M-MATLAB and D4M.jl.
Supports Accumulo and SQL databases.
Enhances Python's capabilities for big data analysis.
Abstract
Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamental functions and building blocks, and performance results which compare 's performance to D4M-MATLAB and D4M.jl.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
