Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Jeremy Kepner; Vijay Gadepally; Lauren Milechin; Siddharth Samsi,; William Arcand; David Bestor; William Bergeron; Chansup Byun; Matthew; Hubbell; Michael Houle; Michael Jones; Anne Klein; Peter Michaleas; Julie; Mullen; Andrew Prout; Antonio Rosa; Charles Yee; Albert Reuther

arXiv:1907.04217·cs.DC·December 3, 2019

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi,, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew, Hubbell, Michael Houle, Michael Jones, Anne Klein, Peter Michaleas, Julie, Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

PDF

TL;DR

This paper presents a highly optimized hierarchical associative array implementation in D4M that achieves over 1.9 billion updates per second, enabling real-time analysis of massive streaming network data.

Contribution

It introduces a hierarchical associative array design that significantly boosts update rates, scalable across thousands of servers for large-scale network data analysis.

Findings

01

Achieved over 40,000 updates/sec in a single instance.

02

Scaled to 34,000 instances on 1,100 servers, reaching 1.9 billion updates/sec.

03

Enabled real-time analysis of extremely large streaming network data.

Abstract

The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the…

Equations41

A : K_{1} \times K_{2} \to V

A : K_{1} \times K_{2} \to V

A = A (k_{1}, k_{2}, v)

A = A (k_{1}, k_{2}, v)

I (k_{1}, k_{2}) = A (k_{1}, k_{2}, 1)

I (k_{1}, k_{2}) = A (k_{1}, k_{2}, 1)

C = A \oplus B

C = A \oplus B

C (k_{1}, k_{2}) = A (k_{1}, k_{2}) \oplus B (k_{1}, k_{2})

C (k_{1}, k_{2}) = A (k_{1}, k_{2}) \oplus B (k_{1}, k_{2})

C = A \otimes B

C = A \otimes B

C (k_{1}, k_{2}) = A (k_{1}, k_{2}) \otimes B (k_{1}, k_{2})

C (k_{1}, k_{2}) = A (k_{1}, k_{2}) \otimes B (k_{1}, k_{2})

C = AB = A \oplus . \otimes B

C = AB = A \oplus . \otimes B

C (k_{1}, k_{2}) = k ⨁ A (k_{1}, k) \otimes B (k, k_{2})

C (k_{1}, k_{2}) = k ⨁ A (k_{1}, k) \otimes B (k, k_{2})

A (k_{2}, k_{1}) = A^{T} (k_{1}, k_{2})

A (k_{2}, k_{1}) = A^{T} (k_{1}, k_{2})

A \oplus B

A \oplus B

A \otimes B

(AB)^{T}

(A \oplus B) \oplus C

(A \oplus B) \oplus C

(A \otimes B) \otimes C

(AB) C

A \otimes (B \oplus C)

A \otimes (B \oplus C)

A (B \oplus C)

A \oplus \mathbs 0 = A A \otimes \mathbs 1 = A A I = A

A \oplus \mathbs 0 = A A \otimes \mathbs 1 = A A I = A

A \otimes \mathbs 0 = \mathbs 0 A \mathbs 0 = \mathbs 0

A \otimes \mathbs 0 = \mathbs 0 A \mathbs 0 = \mathbs 0

A_{1} = A_{1} + A

A_{1} = A_{1} + A

A_{2} = A_{2} + A_{1}

A_{2} = A_{2} + A_{1}

A = i = 1 \sum N A_{i}

A = i = 1 \sum N A_{i}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Jeremy Kepner1,2,3, Vijay Gadepally1,2, Lauren Milechin4, Siddharth Samsi1,

William Arcand1, David Bestor1, William Bergeron1, Chansup Byun1, Matthew Hubbell1,

Michael Houle1, Michael Jones1, Anne Klein1, Peter Michaleas1,

Julie Mullen1, Andrew Prout1, Antonio Rosa1, Charles Yee1, Albert Reuther1

1MIT Lincoln Laboratory Supercomputing Center, 2MIT Computer Science & AI Laboratory,

3MIT Mathematics Department, 4MIT Department of Earth, Atmospheric and Planetary Sciences

Abstract

The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

I Introduction

††footnotetext: This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001 and National Science Foundation grants DMS-1312831 and CCF-1533644. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering or the National Science Foundation.

Networks form the basis of worldwide communication and are estimated to produce many terabytes per second (TB/ of Internet traffic. The rise of sophisticated cyber threats is increasing rapidly. Thousands of new malware programs are registered each day, and a majority of all web traffic comes from bots, many of which are malicious in nature. High performance analysis of network traffic is critical for defending the Internet. An important technical challenge for analyzing streaming network data is the need to rapidly build and store analyzable network representations of the data [1, 2, 3, 4, 5, 6, 7, 8]. In the case of IP network traffic data, there is the additional challenge that the IP address space is much larger than what is observed in a typical data collection, thus the data are hypersparse and require dealing with potentially new endpoint labels at every update.

Development of novel computer network analytics requires: high-level programming environments, massive amounts of network data, and diverse data products for “at scale” algorithm pipeline development. Our team has developed a scalable network analytics platform using the D4M (Dynamic Distributed Dimensional Data Model - d4m.mit.edu) analytics environment and MIT SuperCloud interactive computing environment [9, 10]. D4M combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. The MIT SuperCloud allows users to interactively process massive amounts of data in minutes on many thousands of cores while using the software and environments most familiar to them. A key challenge for this pipeline is handling streaming updates of large networks in hypersparse representation. This paper describes the implementation of a hierarchical approach designed to optimize the performance of the memory hierarchy. First, associative array mathematics is presented that is the basis of D4M. Next, is given the design of the hierarchical associative array that leverages these mathematics. The hierarchical array tuning parameters are described, and results are presented for different parameter settings. Finally, the performance results of using 34,000 instances on 1,100 compute nodes are presented.

II Associative Array Mathematics

Analyzing large-scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data (see Fig. 1). In many databases, these table operations can be mapped onto well-defined mathematical operations with known mathematical properties. For example, relational (or SQL) databases [11, 12, 13] are described by relational algebra [14, 15, 16] that corresponds to the union-intersection semiring ${\cup}.{\cap}$ [17]. Triple-store databases (NoSQL) [18, 19, 20, 21, 22] and analytic databases (NewSQL) [23, 24, 25, 26, 27, 28] follow similar mathematics [29]. The table operations of these databases are further encompassed by associative array algebra, which brings the beneficial properties of matrix mathematics and sparse linear systems theory, such as closure, commutativity, associativity, and distributivity [30]. The aforementioned mathematical properties provide strong correctness and linearity guarantees that are independent of scale and particularly helpful when trying to reason about massively parallel systems.

The full mathematics of associative arrays and the ways they encompass matrix mathematics and relational algebra are described in the aforementioned references [17, 29, 30]. Only the essential mathematical properties of associative arrays are reviewed here. The essence of associative array algebra is three operations: element-wise addition (database table union), element-wise multiplication (database table intersection), and array multiplication (database table transformation). In brief, an associative array $\mathbf{A}$ is defined as a mapping from sets of keys to values

[TABLE]

where $K_{1}$ are the row keys and $K_{2}$ are the column keys and can be any sortable set, such as integers, real numbers, and strings. The row keys are equivalent to the sequence ID in a relational database table. The column keys are equivalent to the column names in a database table. $\mathbb{V}$ is a set of values that forms a semiring $(\mathbb{V},\oplus,\otimes,0,1)$ with addition operation $\oplus$ , multiplication operation $\otimes$ , additive identity/multiplicative annihilator 0, and multiplicative identity 1. The values can take on many forms, such as numbers, strings, and sets. One of the most powerful features of associative arrays is that addition and multiplication can be a wide variety of operations. Some of the common combinations of addition and multiplication operations that have proven valuable are standard arithmetic addition and multiplication ${+}.{\times}$ , the aforementioned union and intersection ${\cup}.{\cap}$ , and various tropical algebras that are important in finance [31, 32, 33] and neural networks [34]: ${\max}.{+}$ , ${\min}.{+}$ , ${\max}.{\times}$ , ${\min}.{\times}$ , ${\max}.{\min}$ , and ${\min}.{\max}$ .

The construction of an associative array is denoted

[TABLE]

where $\mathbf{k}_{1}$ , $\mathbf{k}_{2}$ , and $\mathbf{v}$ are vectors of the row keys, column keys, and values of the nonzero elements of $\mathbf{A}$ . When the values are 1 and there is only one nonzero entry per row or column, this associative array is denoted

[TABLE]

and when $\mathbb{I}(\mathbf{k})=\mathbb{I}(\mathbf{k},\mathbf{k})$ , this array is referred to as the identity.

Given associative arrays $\mathbf{A}$ , $\mathbf{B}$ , and $\mathbf{C}$ , element-wise addition is denoted

[TABLE]

or more specifically

[TABLE]

where $k_{1}\in K_{1}$ and $k_{2}\in K_{2}$ . Similarly, element-wise multiplication is denoted

[TABLE]

or more specifically

[TABLE]

Array multiplication combines addition and multiplication and is written

[TABLE]

or more specifically

[TABLE]

where $k$ corresponds to the column key of $\mathbf{A}$ and the row key of $\mathbf{B}$ . Finally, the array transpose is denoted

[TABLE]

The above operations have been found to enable a wide range of database algorithms and matrix mathematics while also preserving several valuable mathematical properties that ensure the correctness of out-of-order execution. These properties include commutativity

[TABLE]

associativity

[TABLE]

distributivity

[TABLE]

and the additive and multiplicative identities

[TABLE]

where $\mathbs{0}$ is an array of all 0, $\mathbs{1}$ is an array of all 1, and $\mathbb{I}$ is an array with 1 along its diagonal. Furthermore, these arrays possess a multiplicative annihilator

[TABLE]

Most significantly, the properties of associative arrays are determined by the properties of the value set $\mathbb{V}$ . In other words, if $\mathbb{V}$ is linear (distributive), then so are the corresponding associative arrays.

Intersection $\cap$ distributing over union $\cup$ is essential to database query planning and parallel query execution over partioned/sharded database tables [35, 36, 37, 38, 39, 40, 41]. Similarly, matrix multiplication distributing over matrix addition ensures the correctness of massively parallel implementations on the world’s largest supercomputers [42] and machine learning systems [43, 44, 45]. In software engineering, the scalable commutativity rule guarantees the existence of a conflict-free (parallel) implementation [46, 47, 48]. The full mathematics of associative arrays and the ways they encompass matrix mathematics and relational algebra are described in the aforementioned references [17, 29, 30].

III Hierarchical Associative Arrays

The D4M library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database. Most implementations use sorted strings for the row, column, and value labels, and standard sparse matrix to connect the triples. Associative arrays are designed for block updates. Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy (see Fig. 2). Rapid updates are performed on the smallest arrays in the fastest memory. The strong mathematical properties of associative arrays allow a hierarchical implementation of associative arrays to be implemented via simple addition. All creation and organization of hypersparse row and column labels are handled naturally by the associative array mathematics. If the number of nonzero (nnz) entries exceeds the threshold $c_{i}$ , then ${\bf A}_{i}$ is added to ${\bf A}_{i+1}$ and ${\bf A}_{i}$ is cleared. The overall usage is as follows

•

Initialize $N$ -layer hierarchical array with cuts $c_{i}$

•

Update by adding data ${\bf A}$ to lowest layer

[TABLE]

•

If ${\rm nnz}({\bf A}_{1})>c_{1}$ , then

[TABLE]

and clear ${\bf A}_{1}$ .

The above steps are repeated until ${\rm nnz}({\bf A}_{i})\leq c_{i}$ or $i=N$ . To complete all pending updates for analysis, all the layers are added together

[TABLE]

Hierarchical arrays dramatically reduce the number of updates to slow memory. Upon query, all layers in the hierarchy are summed into the largest array. The cut values $c_{i}$ can be selected so as to optimize the performance with respect to particular applications. The majority of the complex updating is performed by using the existing D4M associative array addition operation. The corresponding Matlab/Octave D4M code for performing the update is direct translation of the above mathematics as follows

function Ai = HierAdd(Ai,A,c);

Ai{1} = Ai{1} + A;

for i=1:length(c)

if (nnz(Ai{i}) > c(i))

Ai{i+1} = Ai{i+1} + Ai{i};

Ai{i} = Assoc('','','');

end

IV Performance Optimization

The performance of a hierarchical associative array for any particular problem is determined by the number of layers $N$ and the cut values $c_{i}$ . The parameters are tuned to achieve optimal performance for a given problem. Examples of two sets of cut values with different $c_{1}$ and different ratios between cut values are shown in Figure 3. These sets of cut values allow exploration of the update performance of many closely spaced cuts versus few widely spaced cuts. Figure 4 shows the single instance (single processor core) performance for different numbers of layers and cut values on a simulated Graph500.org R-Mat power-law network data. The data set contains 100,000,000 connections that are inserted in groups of 100,000. In general, the hierarchical performance is much better than the non-hierarchical implementation (0-cuts) and increases until the last cut is above the total number of entries in the data. The instantaneous insert rate of each group of network connections is shown in Figure 5 and indicates that more cuts allow more updates to occur quickly because they are taking place in faster memory.

V Scalability Results

The scalability of the hierarchical associative arrays are tested using a power-law graph of 100,000,000 entries divided up into 1,000 sets of 100,000 entries. These data were then simultaneously loaded and updated using a varying number of processes on varying number of nodes on the MIT SuperCloud up to 1,100 servers with 34,000 processors. This experiment mimics thousands of processors, each creating many different graphs of 100,000,000 edges each. In a real analysis application, each process would also compute various network statistics on each of the streams as they are updated. The update rate as a function of number of server nodes is shown on Fig. 6. The achieved update rate of 1,900,000,000 updates per second is significantly larger than the rate in prior published results. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

VI Conclusion

The D4M implementation of associative arrays provides a lightweight in-memory database ideal for analyzing hypersparse network data. Associative array mathematics combines properties of spreadsheets, databases, matrices, graphs, and networks and provides strong linearity guarantees. Streaming data into associative arrays puts enormous pressure on a memory hierarchy. D4M hierarchical associative arrays reduce memory pressure and increase update performance. The linearity properties of associative arrays allow a hierarchical associative array to be implemented using simple addition operations. The performance of hierarchical associative arrays comes from controlling the number entries at each level and can be tuned for any particular application. Hierarchical arrays achieve over 40,000 updates per second in a single instance and are significantly faster than non-hierarchical associative arrays. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second.

Acknowledgement

The authors wish to acknowledge the following individuals for their contributions and support: Bob Bond, Alan Edelman, Charles Leiserson, Dave Martinez, Mimi McClure, Victor Roytburd, Michael Wright.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] V. G. Castellana, M. Minutoli, S. Bhatt, K. Agarwal, A. Bleeker, J. Feo, D. Chavarría-Miranda, and D. Haglin, “High-performance data analytics beyond the relational and graph data models with gems,” in 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) , pp. 1029–1038, IEEE, 2017.
2[2] F. Busato, O. Green, N. Bombieri, and D. A. Bader, “Hornet: An efficient data structure for dynamic sparse graphs and matrices on gpus,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , pp. 1–7, IEEE, 2018.
3[3] A. Ya?ar, S. Rajamanickam, M. Wolf, J. Berry, and . V. ataly rek, “Fast triangle counting using cilk,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , pp. 1–7, Sep. 2018.
4[4] Y. Hu, H. Liu, and H. H. Huang, “High-performance triangle counting on gpus,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , pp. 1–5, Sep. 2018.
5[5] M. Bisson and M. Fatica, “Update on static graph challenge on gpu,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , pp. 1–8, Sep. 2018.
6[6] R. Pearce and G. Sanders, “K-truss decomposition for scale-free graphs at scale in distributed memory,” in 2018 IEEE High Performance extreme Computing Conference (HPEC) , pp. 1–6, Sep. 2018.
7[7] S. Samsi, V. Gadepally, M. Hurley, M. Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, W. Song, et al. , “Static graph challenge: Subgraph isomorphism,” in 2017 IEEE High Performance Extreme Computing Conference (HPEC) , pp. 1–6, IEEE, 2017.
8[8] E. Kao, V. Gadepally, M. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, et al. , “Streaming graph challenge: Stochastic block partition,” in 2017 IEEE High Performance Extreme Computing Conference (HPEC) , pp. 1–12, IEEE, 2017.