Constructing Adjacency Arrays from Incidence Arrays
Hayden Jananthan, Karia Dibert, Jeremy Kepner

TL;DR
This paper establishes mathematical criteria for constructing adjacency arrays from incidence arrays in graph processing, detailing how different algebraic operations influence the resulting structure, with practical illustrations.
Contribution
It provides the necessary mathematical conditions for accurately deriving adjacency arrays from incidence arrays using various algebraic operations.
Findings
Criteria for adjacency array construction established
Impact of different algebraic operations analyzed
Practical examples using music metadata provided
Abstract
Graph construction, a fundamental operation in a data processing pipeline, is typically done by multiplying the incidence array representations of a graph, and , to produce an adjacency array of the graph, , that can be processed with a variety of algorithms. This paper provides the mathematical criteria to determine if the product will have the required structure of the adjacency array of the graph. The values in the resulting adjacency array are determined by the corresponding addition and multiplication operations used to perform the array multiplication. Illustrations of the various results possible from different and operations are provided using a small collection of popular music metadata.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Constructing Adjacency Arrays from Incidence Arrays
Hayden Jananthan1,2 Karia Dibert2,3 Jeremy Kepner2,3,4
1Vanderbilt University Mathematics Department, 2MIT Lincoln Laboratory Supercomputing Center,
3MIT Mathematics Department, 4MIT Computer Science & AI Laboratory
Abstract
Graph construction, a fundamental operation in a data processing pipeline, is typically done by multiplying the incidence array representations of a graph, and , to produce an adjacency array of the graph, , that can be processed with a variety of algorithms. This paper provides the mathematical criteria to determine if the product will have the required structure of the adjacency array of the graph. The values in the resulting adjacency array are determined by the corresponding addition and multiplication operations used to perform the array multiplication. Illustrations of the various results possible from different and operations are provided using a small collection of popular music metadata.
Index Terms:
graph; incidence array; adjacency array; semiring
I Introduction
††footnotetext: This material is based in part upon work supported by the NSF under grant number DMS-1312831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
The duality between the canonical representation of graphs as abstract collections of vertices and edges and a matrix representation has been a part of graph theory since its inception [Konig 1931, Konig 1936]. Matrix algebra has been recognized as a useful tool in graph theory for nearly as long [Harary 1969, Sabadusi 1960, Weischel 1962, McAndrew 1963, Teh & Yap 1964, McAndrew 1965, Harary & Tauth 1964, Brualdi 1967]. The modern description of the duality between graph algorithms and matrix mathematics (or sparse linear algebra) has been extensively covered in the recent literature [Kepner & Gilbert 2011] and has further spawned the development of the GraphBLAS math library standard (GraphBLAS.org)[Mattson et al 2013] that has been developed in a series of proceedings [Mattson 2014a, Mattson 2014b, Mattson 2015, Buluç 2015, Mattson 2016] and implementations [Buluç & Gilbert 2011, Kepner et al 2012, Ekanadham et al 2014, Hutchison et al 2015, Anderson et al 2016, Zhang et al 2016].
Adjacency arrays, typically denoted , have much in common with adjacency matrices. Likewise, incidence arrays or edge arrays, typically denoted , have much in common with incidence matrices [Bruck & Ryser 1949, Ford & Fulkerson 1962, Fulkerson & Gross 1965, Fisher & Wing 1965], edge matrices [Dobrjanskyj & Freudenstein 1967], adjacency lists [Bodin & Kursh 1979], and adjacency structures [Tarjan 1972]. The powerful link between adjacency arrays and incidence arrays via array multiplication is the focus of the first part of this paper.
Incidence arrays are often readily obtained from raw data. In many cases, an associative array representing a spreadsheet or database table is already in the form of an incidence array. However, to analyze a graph, it is often convenient to represent the graph as an adjacency array. Constructing an adjacency array from data stored in an incidence array via array multiplication is one of the most common and important steps in a data processing system.
Given a graph with vertex set and edge set , the construction of adjacency arrays for relies on the assumption that is an adjacency array of . This assumption is certainly true in the most common case where the value set is composed of non-negative reals and the operations and are arithmetic plus () and arithmetic times () respectively. However, one hallmark of associative arrays is their ability to contain as values nontraditional data. For these value sets, and may be redefined to operate on non-numerical values. For example, for the value of all alphanumeric strings, with
[TABLE]
it is not immediately apparent in this case whether is an adjacency array of the graph whose set of vertices is . In the subsequent sections, the criteria on the value set and the operations and are presented so that
[TABLE]
always produces an adjacency array [Dibert et al 2015].
I-A Definitions
For a directed graph (from here onwards, just ‘graph’) , will denote the set of vertices which are the sources of edges, will denote the set of vertices which are the targets of edges, and will denote the set of edges. The vertex set of will be assumed to be . , , and are assumed to be finite and totally-ordered.
will denote the set of values that the data can take on, such as non-negative real numbers or the elements of an ordered set. and are binary operations on (in particular, is closed under the operations and ), such as and or and . and each have identity elements [math] and , respectively, i.e.
[TABLE]
for all .
For the purposes of understanding what algebraic properties are required for to be an adjacency array of a graph, and will not be assumed to be associative or commutative, and does not necessarily distribute over , nor is [math] assumed to be an annihilator of .
Definition I.1** (Associative Array).**
An associative array is a map , where and are finite totally-ordered sets, referred to as key sets and whose elements are called keys, and is the value set.
Definition I.2** (Transpose).**
If is an associative array, then is the associative array defined as
[TABLE]
where and .
Definition I.3** (Array Multiplication).**
Multiplication of associative arrays is defined as
[TABLE]
or more specifically
[TABLE]
where , , and are associative arrays
[TABLE]
and , , .
Definition I.4** (Incidence Arrays).**
If is a graph with vertex set and edge set , then
: is a source incidence array if if and only if the edge is directed outward from the vertex
: is a target incidence array if if and only if the edge is directed into the vertex .
Definition I.5** (Adjacency Array).**
If is a graph with vertex set and edge set , then is a adjacency array if if and only if there is an edge with source and target .
II Adjacency Array Construction
If is an adjacency array for a graph , then if and only if there is an edge with source and target , i.e. so that and . In the case where the product of two non-zero values is non-zero, this can be subsumed to say that if and only if . Writing this as
[TABLE]
This latter expression looks like a term in the evaluation
[TABLE]
but the introduction of more terms means that more assumptions need to be made about the relationships between , and [math].
Theorem II.1**.**
Let be a set with closed binary operations with identities . Then the following are equivalent:
* and satisfy the properties*
- (a)
Zero-Sum-Free: if and only if , 2. (b)
No Zero Divisors: if and only if or , and 3. (c)
[math]* is Annihilator for : .* 2. 2.
If is a graph with out-vertex and in-vertex incidence arrays and , then is an adjacency array for .
Proof.
Let .
As above, for to be the adjacency array of , the entry must be nonzero if and only if there is an edge from to , which is equivalent to saying that the entry must be nonzero if and only if there is a such that
[TABLE]
Taken altogether, the above pair of equations imply
[TABLE]
First, the above condition can be restated in a form that more easily provides the zero-sum-freeness of , lack of zero-divisors for , and the fact that [math] annihilates under . Equation II is equivalent to
[TABLE]
which in turn is equivalent to
[TABLE]
This expression may be split up into two conditional statements
[TABLE]
and
[TABLE]
Lemma II.2**.**
Equation 3 implies that is zero-sum-free.
Proof.
Suppose there exist nonzero such that , or that nontrivial additive inverses exist. Then it is possible to choose a graph to have edge set and vertex set , where both start from and end at . Then defining
[TABLE]
provides proper out-vertex and in-vertex incidence arrays for . Moreover, it is the case that
[TABLE]
which contradicts Equation 3. Therefore, no such nonzero and may be present in , meaning it is necessary that be zero-sum-free. ∎
Lemma II.3**.**
Equation 3 implies that has no zero-divisors.
Proof.
Suppose . Define the graph to have edge set and vertex set with a single self-loop given by . Then define
[TABLE]
to obtain out-vertex and in-vertex incidence arrays for . Then
[TABLE]
Thus, Equation 3 implies that , and hence has no zero-divisors. ∎
Lemma II.4**.**
Equation 3 implies that [math] annihilates under .
Proof.
Suppose . Define the graph to have edge set and vertex set , with self-loops at and given by and , respectively. Define
[TABLE]
and
[TABLE]
(and all other entries in and equal to [math]) results in out-vertex and in-vertex incidence arrays of . Moreover, it is true that
[TABLE]
By Lemma II.2, is zero-sum-free so it follows that . Thus, [math] is an annihilator for . ∎
Now Theorem II.1(i) is shown to be sufficient for Theorem II.1(ii) to hold. Assume that zero is an annihilator, is zero-sum-free, and has no zero-divisors. Zero-sum-freeness and the nonexistence of zero divisors give
[TABLE]
which is the contrapositive of Equation 3. And, that zero is an annihilator gives
[TABLE]
which is (4). As Equation 3 and Equation 4 combine to form Equation II, it is established that the conditions are sufficient for Equation II. ∎
III Adjacency Array of Reverse Graph
The remaining product of the incidence arrays that is defined is . The above requirements will now be shown to be necessary and sufficient for the remaining product to be the adjacency array of the reverse of the graph. Recall that the reverse of is the graph in which all the arrows in have been reversed. Let be a graph with incidence matrices and .
Corollary III.1**.**
Condition (i) in Theorem II.1 are necessary and sufficient so that is an adjacency matrix of the reverse of .
Proof.
Let denote the reverse of , and let and be out-vertex and in-vertex incidence arrays for , respectively. Recall that is defined to have the same edge and vertex sets as but changes the directions of the edges, in other words, if an edge leaves a vertex in , then it enters in , and vice versa. As such, if and only if , and likewise if and only if . As such, choosing and gives valid in-vertex and out-vertex incidence matrices for , respectively. Then by Theorem II.1 it can be shown that
[TABLE]
∎
It is now straightforward to identify algebraic structures that comply with the established criteria. Notably, all zero-sum-free semirings with no zero-divisors comply, such as or with the standard addition and multiplication. In addition, any linearly ordered set with and given by and , respectively. Some non-examples, however, include the max-plus algebra or non-trivial Boolean algebras, which do not satisfy the zero-product property, or rings, which except for the zero ring are not zero-sum-free. Furthermore, the value sets of associative arrays need not be defined exclusively as semirings, as several semiring-like structures satisfy the criteria. These structures may lack the properties of additive or multiplicative commutativity, additive or multiplicative associativity, or distributivity of multiplication over addition, which are not necessary to ensure that the product of incidence arrays yields an adjacency array.
The criteria guarantee an accurate adjacency array for any dataset that satisfies them, regardless of value distribution in the incidence arrays. However, if the incidence arrays are known to possess a certain structure, it is possible to circumvent some of the conditions and still always produce adjacency arrays. For example, if each key set of an undirected incidence array is a list of documents and the array entries are sets of words shared by documents, then it is necessary that a word in and has to be in and . This structure means that when multiplying using and , a nonempty set will never be “multiplied” by (intersected with) a disjoint nonempty set. This eliminates the need for the zero-product property to be satisfied, as every multiplication of nonempty sets is already guaranteed to produce a nonempty set. The array produced will contain as entries a list of words shared by those two documents.
Though the criteria ensure that the product of incidence arrays will be an adjacency array, they do not ensure that certain matrix properties hold. For example, the property may be violated under these criteria, as is not necessarily equal to . (For this matrix transpose property to always hold, the operation would have to be commutative.)
IV Graph Construction with Different Semirings
The ability to change and operations allows different graph adjacency arrays to be constructed using the same element-wise addition, element-wise multiplication, and array multiplication syntax. Specific pairs of operations are best suited for constructing certain types of adjacency arrays. The pattern of edges resulting from array multiplication of incidence arrays is generally preserved for various semirings. However, the non-zero values assigned to the edges can be very different and enable the construction different graphs.
For example, constructing an adjacency array of the graph of music writers connected to music genres from Figure 1 begins with selecting the incidence sub-arrays and as shown in Figure 2. Array multiplication of with produces the desired adjacency array of the graph. Figure 3 illustrates this array multiplication for different operator pairs and .
The pattern of edges among vertices in the adjacency arrays shown Figure 3 are the same for the different operator pairs, but the edge weights differ. All the non-zero values in and are 1. All the operators in Figure 3 have the property
[TABLE]
for their respective values of zero be it 0, , or . Likewise, all the operators in Figure 3 also have the property
[TABLE]
except where , in which case
[TABLE]
The differences in the adjacency array weights are less pronounced then if the values of and were more diverse. The most apparent difference is between the semiring and the other semirings in Figure 3. In the case of semiring, the operation aggregates values from all the edges between two vertices. Additional positive edges will increase the overall weight in the adjacency array. In the other pairs of operations, the operator is either or , which effectively selects only one edge weight to use for assigning the overall weight. Additional edges will only impact the edge weight in the adjacency array if the new edge is an appropriate maximum or minimum value. Thus, constructs adjacency arrays that aggregate all the edges. The sother emirings construct adjacency arrays that select extremal edges. Each can be useful for construction graph adjacency arrays in appropriate context.
The impact of different semirings on the graph adjacency array weights are more pronounced if the values of and are more diverse. Figure 4 modifies so that a value of 2 is given to the non-zero values in the column GenrePop and a values of 3 is given to the non-zero values in the column GenreRock.
Figure 5 shows the results of constructing adjacency arrays with and using different semirings. The impact of changing the values in can be seen by comparing Figure 3 with Figure 5. For the semiring, the values in the adjacency array rows GenrePop and GenreRock are multiplied by 2 and 3. The increased adjacency array values for these rows are a result of the operator being arithmetic multiplication so that
[TABLE]
For the and semirings, the values in the adjacency array rows GenrePop and GenreRock are larger by and 1 and 2. The larger values in the adjacency array of these rows is due to the operator being arithmetic addition resulting in
[TABLE]
For the semiring, Figure 3 and Figure 5 have the same adjacency array because is unchanged. The operator corresponding to the minimum value function continues to select the smaller non-zero values from
[TABLE]
In contrast, for the semiring, the values in the adjacency array rows GenrePop and GenreRock are larger by and 1 and 2. The increase in adjacency array values for these rows are a result of the operator selecting the larger non-zero values from
[TABLE]
Finally, for the and semirings, the values in the adjacency array rows GenrePop and GenreRock are increased by and 1 and 2. Similar to the semiring, the larger adjacency array values for these rows are a result of the operator being arithmetic multiplication resulting in
[TABLE]
Figures 3 and 5 show that a wide range of graph adjacency arrays can be constructed via array multiplication of incidence arrays over different semirings. A synopsis of the graph constructions illustrated in Figures 3 and 5 is as follows
sum of products of edge weights connecting two vertices; computes the strength of all connections between two connected vertices.
maximum of products edge weights connecting two vertices; selects the edge with largest weighted product of all the edges connecting two vertices.
minimum of products edge weights connecting two vertices; selects the edge with smallest weighted product of all the edges connecting two vertices.
maximum of sum of edge weights connecting two vertices; selects the edge with largest weighted sum of all the edges connecting two vertices.
minimum of sum of edge weights connecting two vertices; selects the edge with smallest weighted sum of all the edges connecting two vertices.
maximum of the minimum of weights connecting two vertices; selects the largest of all the shortest connections between two vertices.
minimum of the maximum of weights connecting two vertices; selects the smallest of all the largest connections between two vertices.
V Conclusion
Graph construction, a fundamental operation in a data processing pipeline, is typically done by multiplying the incidence array representations of a graph, and , to produce an adjacency array of the graph, . The mathematical criteria to determine if will have the required structure of the adjacency array of the graph over are as follows. Let be a set with closed binary operations with identities . Then the following are equivalent:
and satisfy the properties
- (a)
Zero-Sum-Free: if and only if , 2. (b)
No Zero Divisors: if and only if or , and 3. (c)
[math] is Annihilator for : . 2. 2.
If is a graph with out-vertex and in-vertex incidence arrays and , then is an adjacency array for .
The values in the resulting adjacency array are determined by the corresponding addition and multiplication operations used to perform the array multiplication.
Acknowledgment
The authors would like to thank Paul Burkhardt, Alan Edelman, Sterling Foster, Vijay Gadepally, Sam Madden, Dave Martinez, Tom Mattson, Albert Reuther, Victor Roytburd, and Michael Stonebraker.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Anderson et al 2016] M. Anderson, N. Sundaram, N. Satish, M. Patwary, T. L. Willke, & P. Dubey, Graph Pad: Optimized Graph Primitives for Parallel and Distributed Platforms, submitted
- 2[Bodin & Kursh 1979] L. Bodin & S. Kursh, A detailed description of a computer system for the routing and scheduling of street sweepers , Computers & Operations Research, 6(4), 181-198, 1979
- 3[Brualdi 1967] R.A. Brualdi, Kronecker products of fully indecomposable matrices and of ultrastrong digraphs , Journal of Combinatorial Theory, 2:135-139, 1967
- 4[Bruck & Ryser 1949] R. Bruck & H. Ryser, The nonexistence of certain finite projective planes , Canadian Journal of Mathematics, 1, 88-93, 1949
- 5[Buluç & Gilbert 2011] A. Buluç & J. Gilbert, The Combinatorial BLAS: Design, implementation, and applications . International Journal of High Performance Computing Applications (IJHPCA), 2011
- 6[Buluç 2015] A. Buluç, Graph BLAS Special Session, IEEE HPEC 2015, Waltham, MA
- 7[Dibert et al 2015] K. Dibert, H. Jansen & J. Kepner, Algebraic Conditions for Generating Accurate Adjacency Arrays , IEEE MIT Undergraduate Research Technology Conference, 2015
- 8[Dobrjanskyj & Freudenstein 1967] L. Dobrjanskyj & F. Freudenstein, Some applications of graph theory to the structural analysis of mechanisms , Journal of Engineering for Industry, 89(1), 153-158, 1967
