A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases
Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi,, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew, Hubbell, Micheal Houle, Micheal Jones, Anne Klein, Peter Michaleas, Julie, Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

TL;DR
This paper demonstrates a hierarchical in-memory database system using D4M associative arrays that achieves over 1.9 billion streaming updates per second across 34,000 instances on 1,100 servers, enabling large-scale network data analysis.
Contribution
The paper introduces a hierarchical implementation of associative arrays in D4M, achieving unprecedented streaming update rates on a large distributed system.
Findings
Achieved 1.9 billion updates per second with 34,000 instances.
Enabled real-time analysis of extremely large streaming network data.
Demonstrated scalability across 1,100 server nodes.
Abstract
Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database. Associative arrays are designed for block updates. Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy. Running 34,000 instances of a hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
