Topology Discovery Using Path Interference
Anurag Rai, Eytan Modiano

TL;DR
This paper introduces a method for inferring network topology by analyzing path interference at end nodes, providing algorithms for specific topologies and a heuristic for general cases, with improved performance over existing methods.
Contribution
It presents a novel approach to topology inference using path interference measurements and develops polynomial algorithms for trees and rings, plus a heuristic for general networks.
Findings
Algorithms for tree and ring topologies are optimal.
The heuristic outperforms existing distance-based algorithms.
Simulation results validate the effectiveness of the proposed methods.
Abstract
We consider the problem of inferring the topology of a network using the measurements available at the end nodes, without cooperation from the internal nodes. To this end, we provide a simple method to obtain path interference which identifies whether two paths in the network intersect with each other. Using this information, we formulate the topology inference problem as an integer program and develop polynomial time algorithms to solve it optimally for networks with tree and ring topologies. Finally, we use the insight developed from these algorithms to develop a heuristic for identifying general topologies. Simulation results show that our heuristic outperforms a recently proposed algorithm that uses distance measurements for topology discovery.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Network Technologies · Network Traffic and Congestion Control · Mobile Ad Hoc Networks
Topology Discovery Using Path Interference
Anurag Rai, Eytan Modiano
Laboratory for Information and Decision Systems, MIT, USA
{rai,modiano}@mit.edu
Abstract
We consider the problem of inferring the topology of a network using the measurements available at the end nodes, without cooperation from the internal nodes. To this end, we provide a simple method to obtain path interference which identifies whether two paths in the network intersect with each other. Using this information, we formulate the topology inference problem as an integer program and develop polynomial time algorithms to solve it optimally for networks with tree and ring topologies. Finally, we use the insight developed from these algorithms to develop a heuristic for identifying general topologies. Simulation results show that our heuristic outperforms a recently proposed algorithm that uses distance measurements for topology discovery.
I Introduction
Knowing the topology of the underlying network can provide several advantages to the communicating hosts. For example, the topology can be used to improve the throughput and robustness of the network [18, 19], and it can be a necessary part of identifying bottlenecks and critical links in the network [20]. It can also be used to monitor the network or to simply get a picture of the underlying system. However, often the owners of the network keep the topology information hidden due to privacy and security concerns [5]. This has led to a significant amount of research on topology discovery. We develop a new method that can be used to identify general network topologies. This method only requires the interference pattern of the paths in the network which can be inferred from the data available at the end nodes.
Prior work on topology discovery can be divided into two main categories: algorithms that require cooperation from the internal nodes and the algorithms that do not. Many algorithms for topology discovery, usually designed for the purpose of mapping the Internet, use ICMP commands like traceroute [3, 4, 5]. These methods requires some level of cooperation from the network providers. The other methods, that fall under the category of network tomography [1, 2], use data that can be measured directly at the end nodes. Our method falls under this category as we do not seek any information from the internal nodes.
In the network tomography literature significant attention has been given to the discovery of tree networks. Papers such as [7, 9, 10] use probing mechanisms to infer single source multiple destination trees. There is also some work on combining these single source trees to form a multi-source multi-destination network [8, 13]. In [12], the authors provide a method for identifying minimal trees with multiple sources and multiple destinations by using distance measurements.
In [11], the authors develop an algorithm called RGD1 that attempts to discover a general network topology. It uses a set for four nodes that share a link, called quatrets and uses them to build an approximation of the entrie network. The discovery of the quatret and placement of the nodes in the topology requires the shortest path distance between the nodes, which is inferred using packet delay. RGD1 algorithm is very close to our algorithm in terms the objective, hence we will compare its performance against ours via simulation.
In order to obtain the interference pattern, we provide a simple method based on linear regression. This method uses the number of in-flight packets in the paths and the delay experienced by the packets to determine whether a given pair of paths interfere with each other. Using the resulting interference information, we formulate the topology inference problem as an integer program. We develop polynomial time algorithms to solve it optimally for networks with special topologies, namely tree or ring topology. Both of these algorithms obtain the minimal version of the network, even when the original network is not minimal. We also develop a heuristic that attempts to recover any general topology in polynomial time.
The main contributions of this paper can be summarized as follows:
- •
We use the interference pattern of the paths to formulate an integer linear program (ILP) that obtains the network that has the fewest number of links and supports the given interferences. The solution provides a new method to discover a general network topology.
- •
We provide an upper bound, a lower bound and a sufficient condition for optimality for the ILP.
- •
We design two polynomial time algorithms to recover tree and ring networks and show that if the network is in fact a tree or a ring, the algorithms solve the ILP optimally.
- •
Building upon the tree and the ring algorithms, we develop a polynomial time heuristic to identify general networks. Using simulations we show that this method outperforms the RGD1 algorithm of [11].
II Model
II-A Network Model
We model the network as a graph where is the set of nodes and is the set of edges. We assume that all the links in the network are bidirectional and have unit capacity. Each bidirectional link is composed of two directed links and . The network has two types of nodes: the overlay nodes, which represent hosts and can be controlled, and the underlay nodes, which represent routers that are uncontrollable and do not provide any direct feedback. We represent the set of overlay nodes by and the set of underlay nodes by , and . We further assume that each overlay node is connected to only one underlay node. Other that this, we do not have any knowledge of the structure of the network. The main goal of this paper is to recover the graph from data measured at the overlay nodes.
All the overlay nodes are connected to each other by tunnels, which are paths that go through the underlay nodes. A tunnel consists of overlay nodes and and the rest of the nodes are underlay. Since, and are connected to only one underlay nodes, we will often refer to node as the parent of node , , and node as the parent of node , . There are a total of tunnels in the network.
We also assume that each node maintains a queue for each of it outgoing link . Packets from all the tunnels that uses the link gets enqueued in this queue when they reach node and are served on a first come first serve basis.
II-B The Interference Matrix
Our algorithm for recovering the graph is based on whether or not any two tunnels between the overlay nodes intersect with each other. In order to identify this we propose a simple method based on linear regression. We note that depending on the measurements available, other methods such as the ones from [6, 11] can also be used to derive this information.
Let represent the delay experienced by a packet that enters tunnel at time . Tunnels in the network can intersect with each other, hence, the path traversed by a tunnel can have packets belonging to itself and packets from other tunnels. Let represent the number of packets that belong to tunnel that are still in the tunnel at time . We will refer to these packets as packets in flight of tunnel . The delay experienced by a packet entering tunnel at time is affected by the number of packets in that tunnel and other tunnels that intersect with it. Considering only a pair of tunnels and , we can model the relationship between the packets in flight and delay as a linear function:
[TABLE]
Here represents the fraction of packets of tunnel that are in the path traversed by tunnel and is random perturbation (noise).
By injecting randomly generated traffic into each pair of tunnels and measuring the packet delay and packets in flight, it is possible to determine if two tunnels intersect. In particular, using linear regression it is possible to calculate the optimal parameters that minimizes the noise for each pair of tunnel . When tunnels and do not intersect, the number of packets in tunnel does not affect the delay of packets entering tunnel , hence, Otherwise, is closer to 1. We use these values to create the binary interference matrix . If then , and otherwise. Moreover, is symmetric, implying .
We will use the graph representation of in some of our results. We refer to such a graph as the interference graph of the network, . This graph is simply the graph formed by using as an adjacency matrix, where consists of tunnels and an edge exists between tunnels that interfere with each other. An example of an interference matrix and its corresponding graph is given in Figure 1.
II-C Minimal topology
There exist many networks that produce the same interference matrix , hence, these networks are indistinguishable by our method. For example, each tunnel in the two networks shown in Figure 1 face the same interference. E.g. the tunnel only interferes with tunnel in both the networks. Hence, they produce the same matrix. We are interested the smallest network, in terms of the number of links, that produces the given matrix. We will call such a topology the minimal network topology.
A necessary condition for a network to be minimal was identified in [11]. Specifically, all underlay nodes must have at least three neighbors. If an underlay has only one neighbor, we can simply remove it to obtain a smaller network that is indistinguishable from the original network by using only the measurements available at the overlay. If an underlay node has two neighbors, we can connect its two neighbors and remove the node in order to obtain a smaller network with the same properties. We note that this condition is not sufficient for minimality in general. E.g. in Figure 1(a), all the underlay nodes have 3 neighbors but the topology is not minimal. We will provide a sufficient condition for minimality, and show that the necessary condition from [11] is also sufficient for specific topologies, namely trees and rings.
In this paper we assume that the matrix for a network is given (i.e. obtained via measurements, as described earlier) and focus on obtaining the minimal network that supports this interference pattern.
III Integer Programming Formulation
We formulate the problem of finding the minimal network that supports the given path interference pattern as an integer linear problem (ILP). Although a solution for this ILP is computationally intractable for large networks, studying this formulation will provide us with useful insights into the problem. Also, when the network is small, we are able to solve it optimally.
III-A Integer program
Let us consider a network with nodes. Nodes are overlay nodes and nodes are underlay nodes. Note that the set is known a priori.
Let represent whether link is used by tunnel , for , , and . For notational simplicity, we define another variable which represents whether the link is used by any tunnel in either direction. Hence,
[TABLE]
Here “” is a logical OR operator. Note that such logical constraints can easily be transformed into a set of linear (integer) constraints [15]. The objective function can be written as
[TABLE]
Our network model assumes that each overlay node is connected to only one underlay link. This can be enforced by using the following constraint:
[TABLE]
Again, to simplify the notation we define two new variables. Let represent whether tunnel begins at node , and let represent whether tunnel ends at node . These values are known a priori, so we can replace these variables with their respective values while formulating a specific problem. Now we can write the next set of constraints which are essentially the flow conservation constraints. These constraints guarantee that each tunnel has a set of connected links in the network, starting and ending at its respective overlay nodes.
[TABLE]
We can see that the flow conservation constraints above allows loops to be formed in the network. Unlike max-flow type problems where loops can be removed in the post processing without harming the feasibility, removing them in our case can change the interference pattern of the tunnels. Hence we need to add constraints to avoid formation of loops.
Similar problems arise in the ILP formulation of the Travelling Salesman Problem (TSP). We use the technique originally proposed by Miller-Tucker-Zemlink in [14] to resolve this issue in TSP and add the following constraints:
[TABLE]
Here, the variables is used to assign an order to each node in each tunnel . If then so the next node is assigned a higher value than node . Otherwise, This ensures that there are enough values to assign to all the nodes that the tunnel might pass through.
Finally we consider the interference constraints. For each tunnel pair we add a set of constraints depending on whether tunnels and interfere with each other. If tunnels and do not interfere we have the following constraints:
[TABLE]
This ensures that two tunnels that do not interfere with each are never assigned to the same link. If , then both the tunnels and must appear together in at least one of the links. We enforce this with the following constraints
[TABLE]
Here “” is the logical operator, and these constraints can also be transformed into a set of linear (integer) constraints.
The objective function along with the constraints (1) through (6) give the required ILP for identifying a minimal network. After solving the ILP, the graph can be recovered from the links for which . A node that is not used by any of the tunnels can simply be removed from the recovered network.
III-B Example
We consider a network where 6 underlay nodes are arranged to form a grid, and an overlay node is attached to each underlay node. The network uses the shortest path routing. The interference matrix is generated by determining whether two paths intersect with each other. We formulate the ILP with then solve it using the Gurobi solver [16].
The original and the recovered network are shown in Figure 2. The recovered network has fewer nodes and edges than the original network. Link in the original network is used only by tunnels and in different directions. Hence there is no interference on this link, and it can be removed without changing the interference matrix. For the same reason, link can be removed to obtain the minimal network. Even after the removal of the links, we can see that the recovered network looks quiet similar to the original.
III-C Upper bound
We provide an upper bound on the solution to the ILP in the previous section by using a simple algorithm given in Algorithm 1. This algorithm produces a feasible solution to the ILP by assigning two interfering tunnels to a new link in . This algorithm can be suboptimal because in an optimal solution many tunnels can interfere at the same link.
Algorithm 1 starts with a that is a line graph with edges, then maps each link in the interference graph to a link in . Each edge in represents two tunnels that pass through the same edge in , so if there is an edge between tunnels and in , then tunnels and are assigned to one of the links in . When all the interferences are assigned, it is likely that the same tunnel gets assigned to links that are not attached to each other. In such a case, new links are added to create complete tunnels. An example of this process (Steps 1-3) is given in Figure 3. At the end of Step 3, all the interference constraints are satisfied. Steps 4-8 add the overlay nodes and makes sure that each overlay node is connected to a single underlay node.
We give the following lemma to show that Algorithm 1 produces a feasible solution to the ILP. Then Theorem 1 establishes the upper bound on the number of links used by this algorithm.
Lemma 1**.**
Algorithm 1 obtains a feasible solution to the ILP in Section 1.
Proof.
The proof of this lemma is given in Appendix A. ∎
Theorem 1**.**
The number of edges required for a feasible solution of the ILP, .
Proof.
The proof of this theorem is given in Appendix B. ∎
III-D Lower bound
We establish a lower bound on the number of edges in the minimal graph using the properties of the interference graph. In order to minimize the number of links, we want to assign as many interfering tunnels as possible to the same link. However, we cannot have two tunnels be assigned to the same link if they don’t interfere with each other. This property is nicely abstracted by the cliques in the interference graph . The tunnels, represented by the nodes in , that are in the same clique interfere with each other. So we can assign all of them to the same link. A lower bound is given by the minimum number of cliques required to cover all the links. For example, two cliques are needed to cover all the edges of the interference graph in Figure 3(a), so we need at least two links in to represent all the interferences. In graph theory the smallest such set is known as the minimum edge clique cover111This is different from the minimum node clique cover which is the smallest set of cliques required to cover all the nodes., and the size of such set is known as the intersection number of the graph [17]. Computing the minimum edge clique cover of a graph is known to be NP hard so it might not be useful for the purpose of comparing our solutions. However, in the next subsection we will use it to derive conditions when a recovered graph achieves the lower bound and guarantee optimality.
The following lemma presents the lower bound result in terms of the number of directed links required to have a feasible solution. Theorem 2 extends this result to the case with undirected links, which is the setup in this paper.
Lemma 2**.**
Let be the number of directed links required for a feasible solution of the ILP. Let C be the size of the minimum edge clique cover for the interference graph . Then .
Proof.
The proof of this lemma is given in Appendix C. ∎
Theorem 2**.**
Let be the number of undirected links required for a feasible solution of the ILP. Let C be the size of the minimum edge clique cover for the interference graph . Then,
[TABLE]
Proof.
The proof of this theorem is given in Appendix D. ∎
III-E A sufficient condition for optimality
We give a condition under which a the recovered network has the same number edges as the original network. When this condition is satisfied, the interference pattern cannot be achieved in a smaller network, so this result also provides a sufficient condition for minimality of a network. We prove this result by showing that if the condition is satisfied, then the recovered network achieves the lower bound developed in the previous subsection. We use this result in the subsequent sections to show that our polynomial time algorithms optimally solve the ILP for special networks.
The main result of this subsection says that a given network is minimal if every directed edge in the network is associated with a unique interference (interfering pair of tunnels). Intuitively, this condition seems reasonable because if it is satisfied then each directed link in the graph creates a unique clique in the minimum edge clique cover of the interference graph.
Lemma 3**.**
The size of the minimum edge clique cover of , if and only if for each directed edge there exists a pair of tunnels and such that they intersect at link and nowhere else.
Proof.
The proof of this lemma is given in Appendix E. ∎
Theorem 3**.**
Let C be the size of the minimum edge clique cover for the interference graph . Let be the optimal network obtained by solving the ILP. If every directed link has a pair of tunnels and such that they intersect at link and nowhere else, then has the same number of edges as the original network, i.e. .
Proof.
The proof of this theorem is given in Appendix F. ∎
Note that this theorem provides a sufficient condition but it may not be necessary. That is, there may be graphs where but the ILP still produces a graph with edges. Also, if the number of edges in the optimal network obtained by solving the ILP is the same as the original network, then we know that the both the networks are minimal. Hence, we can use the condition in the theorem as a sufficient condition for minimality of a network.
Corollary 1**.**
A network is minimal if every directed link has a pair of tunnels and such that they intersect at link and nowhere else.
IV Identifying Trees
We design a polynomial time algorithm to recover a tree network. If is a minimal tree, i.e. every non leaf nodes have at least three neighbors and all the leaf nodes are overlay, then this algorithm recovers the tree exactly. A similar result on recovering trees by using distance between the leaf nodes is given in [12], however, the algorithm of [12] requires the network to be minimal. In the situation when the network is a non-minimal tree, our algorithm produces a that is a minimal tree corresponding to since both the networks have the same matrix. Note that there is a unique minimal tree corresponding to each non-minimal tree which can be obtained by using the process discussed in Section II-C.
IV-A Algorithm
The tree identification algorithm is given in Algorithm 2. The algorithm uses the interference matrix to obtain a tree graph with the same . It begins by initializing the graph and checking for terminating conditions in Steps 1 to 3. In Step 4, the algorithm identifies a node such that when all its siblings along with itself are removed, its parent becomes a leaf node. This property will later help us compute a new matrix of the reduced graph. In Step 5, this algorithm finds a group of nodes that consists of all the sibling nodes of . Procedure 3 is used to identify such nodes; see Lemma 5 for proof. These nodes are then added to the recovered graph in Step 6 by assigning then a common parent node, .
Steps 7 removes the sibling nodes in from the original network . Since the graph is not available, the removal is done indirectly by removing the corresponding tunnels from the matrix. Note that node is not removed, instead it is renamed as the parent of the group in Step 8. This works because when all the siblings of are removed, the interference of the tunnels that start or end at is the same as the tunnels that start or end at its parent node. The algorithm is iteratively applied to the reduced matrix until only one or two leaf nodes remain.
An example of the graphs created after the first and the second iterations of this algorithm are shown in Figure 4. In the first iteration, Step 4 identifies one of the tunnels that intersect with the most number of other tunnels, (5,…,1). So in obtained in Step 5. This avoids obtaining sibling groups such as , which when removed does not make their parent a leaf node. Step 6 produces the shown in Figure 4(c), and Steps 7 and 8 result in the reduced tree shown in Figure 4(b). The matrix of the reduced tree is obtained by removing all the tunnels with nodes 6 and 7, then renaming node 5 to the parent node . Similarly the result of the second iteration is shown in Figures 4(d) and 4(e). Since there is only one group of siblings left in the graph after this iteration, the third iteration results in the with only one node. Also, the third iteration produces the that is identical to the original graph in Figure 4(a).
IV-B Analysis
In order to prove that Algorithm 2 obtains the minimal tree, we first show that Step 4 identifies a node whose parent becomes a leaf node when we perform the node removal in Step 7. In Step 8 of the algorithm, this allows us to use the interference properties of the tunnels starting or ending at to obtain the interference of the tunnels of the parent node.
Lemma 4**.**
Let be the tunnel that interferes with the largest number of other tunnels. When all the leaf nodes connected to are removed, becomes a leaf node in the resulting graph.
Proof.
The proof of the lemma is gven in Appendix G. ∎
The following lemma shows that Procedure 3 identifies the nodes that share the same parent. The main idea behind the proof is that a path between two nodes that share the same parent interferes with only the tunnels starting or ending at these nodes.
Lemma 5**.**
Two leaf nodes of a tree and share the same parent if and only if the tunnel from to does not interfere with any tunnel such that or .
Proof.
The proof of this lemma is given in Appendix H. ∎
Now we prove the following theorem that shows that the algorithm recovers the minimal tree network.
Theorem 4**.**
If a given network is a minimal tree, then Algorithm 2 recovers the network.
Proof.
The proof of this theorem is given in Appendix I. ∎
Note that not only the recovered graph is isomorphic to , the relative positions of the overlay nodes are the same. That is if the overlay nodes and share the same parent in , they also share the same parent in . Also, because of the fact that the matrix for a non minimal tree is the same as that of the minimal version of the tree, and the minimal tree is unique for any non-minimal tree, we get the following corollary.
Corollary 2**.**
If a given network is a non-minimal tree, then the tree recovered by Algorithm 2 is the unique minimal tree for .
The following corollary states that the graph generated by the tree algorithm solves the ILP optimally. This is true simply because all minimal trees satisfy the condition of Theorem 3.
Corollary 3**.**
If the interference pattern in a matrix can be represented in a tree, Algorithm 2 produces a that solves the ILP optimally.
Note that even when is not a tree, Algorithm 2 can produce a tree as long as the interference can be represented by a tree. However if the interference pattern cannot be represented by a tree this algorithm will either fail Step 4, or the algorithm terminates but the recovered has a different interference matrix than .
V Identifying Rings
We now consider a situation when the matrix cannot be represented in a tree. Specifically we consider a graph where the underlay nodes are arranged in a ring, and each underlay node is attached to exactly one overlay node. Also, we will assume that the network uses a shortest path routing algorithm, hence, the tunnels take the shortest paths between the overlay nodes. If the matrix can be represented in a ring, our algorithm identifies the order of the overlay nodes. Note that knowing the order of the nodes gives more information than just recovering isomorphic graphs, e.g. in [11]. Just like the tree discovery algorithm in the previous section, this algorithm can also be used to show that a particular network is not a ring.
V-A Algorithm
The ring identification algorithm is given in Algorithm 4. This algorithm builds the ring in an incremental fashion. First, in Step 1 an overlay node and its parent node are added to . The key idea behind the algorithm is in Step 2. It uses the matrix to identify two overlay nodes in the ring that are closest to node , i.e. two nodes such that their parents are neighbors of . In Steps 3 to 5 we attach the two nodes to their parents, and connect the parents to .
V-B Analysis
We will show that Algorithm 4 is guaranteed to recover the correct ring if . For , any ordering of the nodes is the same because the network links are bidirectional, so, using the algorithm is unnecessary. The algorithm might not produce the correct result for the a network with if the tunnels between the nodes in opposite sides pass through the same set of nodes. The networks in both of these situations with 3 or 4 overlay nodes in a ring are not minimal.
Lemma 6**.**
Let be a graph where the underlay nodes are arranged in a ring, and each underlay node is connected to exactly one overlay node. Let . Let and be two overlay nodes and be the tunnel from to . Underlay nodes and are neighbors if and only if tunnel interferes with the fewest number of other tunnels.
Proof.
The proof of this lemma is given in Appendix J. ∎
Algorithm 4 identifies overlay nodes whose parent nodes are neighbors and pieces them together into a ring. Hence, Theorem 5 follows directly from the lemma above.
Theorem 5**.**
If the given network is a minimal ring, Algorithm 4 recovers the network.
Similar to the tree identification algorithm, this algorithm will produce a corresponding minimal ring if the original network is a non-minimal ring. This is true because both the rings have the same matrix. Also, because a minimal ring satisfies the sufficient condition for minimality, this algorithm optimally solves the ILP for ring networks. Hence we get the following two corollaries.
Corollary 4**.**
If a given network is a non-minimal ring with , then the ring recovered by Algorithm 4 is the unique minimal ring for .
Corollary 5**.**
If the interference pattern in the matrix with can be represented in a ring, Algorithm 4 produces a that solves the ILP optimally.
VI Identifying General networks
Inspired by the algorithms for identifying trees and rings in the previous sections, we develop a scheme for identifying general networks. A network can consist of trees and rings connected to each other. Our algorithm assumes that the network uses shortest path routing, and attempts to separate the trees from the rest of the graph, and identify these components separately. We will use Algorithm 2 for recovering the trees, and we will design a new algorithm inspired by Algorithm 4 for the non-tree components. Finally we will combine the discovered components to obtain the full network. This scheme is largely a heuristic, hence, we will compare its performance against another algorithm that also discovers general graphs.
VI-A Algorithm
We first present Algorithm 5 which is designed to recover a graph where every underlay node is part of one or more cycles and only one overlay node is attached to each underlay node. The algorithm works in similar fashion as the ring recovery algorithm from the previous section. The difference is that now each underlay node can have more than two underlay neighbors. So, for each overlay node , the algorithm attempts to find all the overlay nodes whose parents are neighbors of . For clarity, we present this part of the algorithm separately in Procedure 6.
The main idea behind Procedure 6 is shown in an example in Figure 5. For Node 1, the procedure first identifies two neighbors of using the tunnels that start at 1 and intersects with the fewest number of other tunnels. The intuition behind this is the same as the ring algorithm from the previous section, however, when there are more than one rings, it is not guaranteed that the shortest tunnels have the fewest number interferences. It is possible that the tunnel (1,…,5) intersects with the same number of tunnels as (1,…,3). After identifying the two neighbors, the procedure avoids any tunnels that pass through these neighbors and identifies other shortest tunnels.
Finally, we present Algorithm 7 for identifying networks with multiple rings and trees. In Step 2, this algorithm identifies sets of overlay nodes that could be a part of a tree using Procedure 3. Step 2(i) identifies the siblings, , of node . Step 2(ii) obtains the siblings of all the nodes in . If is a sibling of , then must also be a sibling of . Using this property, Step 2(iii) attempts to reduce false positives. Step 2(iv) adds the nodes that are identified as part of a tree into the set of existing nodes. If some part of the tree containing the nodes in have already been identified, then these nodes must have one node in common with , i.e. exists. In such a case, nodes in is added to , otherwise is added as a new element . The tunnels belonging to all but one node in are removed from , and Step 2 is repeated on this new interference matrix. The completion of Step 2 produces the set such that each element of is a set of nodes that belong to the same tree.
Step 3 of the algorithm retrieves the original matrix. Then in Step 4, Algorithm 2 is used on the elements of to discover their corresponding trees. If the tree identification algorithm completes successfully, then all but one of the overlay nodes belonging to the tree are removed from the matrix. The node that is not removed acts as an anchor node while combing the trees and the rest of the graph. In Step 5, the resulting matrix is then used in Algorithm 5 to recover the non-tree part of the graph. In order to combine a tree with the non-tree graph, in Step 6, the anchor node corresponding to the tree is found in the graph. Then in Steps 6(ii) and 6(iii), attempts are made to connect the tree to the anchor node at different locations in tree. The algorithm keeps the connection that minimizes the difference between the interference matrix of the resulting graph and the original matrix.
VI-B Simulation result
We compare the performance of Algorithm 7 against that of RGD1 algorithm from [11]. For the implementation of RGD1, we obtain the exact length of each path by using a shortest path algorithm. All links are assumed to have unit length. We choose the parameter to be 4. We also tried the value of 3 and 5 for this parameter, however, the performance was not as good.
The graphs used to obtain the data for the simulation were generated to be similar to the random graphs considered in [11]. We first generate an Erdős-Rényi random graph with parameters . Then we find the largest connected component of the graph, and remove all the other nodes that do not belong to this component. We then attach overlay nodes to 80% of the remaining nodes uniformly at random. Finally, we remove any underlay nodes that have degree less than 3 by using the process discussed in Section II-C. We generate 100 networks for each value of , where and obtain the measurements required for both algorithms: distances for RGD1 and the matrix for our algorithm. Finally, we use the measurements to recover the graphs.
The performance of the two algorithms was measured by computing the edit distance between the original graph and the recovered graph . Edit distance measures the number of links in that needs to be added or removed in order to make it isomorphic to . This metric is similar to the metric used in [11] to obtain the asymptotic bounds of RGD1. Unfortunately, calculating the graph edit distance is an NP-hard problem, so we use an open source tool called GEDEVO [21] to approximate it.
The results of the simulations are given in Figure 6. Figures 6(a) and 6(b) show the performance of the two algorithms for each of the 100 graphs that were generated. We can see that in most of the cases, our algorithm outperforms RDG1. Figure 6(c) shows the average performance of the two algorithms across different values of . Again, we can see that our algorithm outperforms RGD1.
VII Conclusion
We developed a new method for discovering the topology of a network. It uses the path interference information, which can be obtained by using the measurements available at the end nodes. Using the path interference, we formulated an integer linear program that finds a minimal graph that can contain all the interferences. We then developed polynomial time algorithms that solve the ILP for the special cases when the network is a tree or a cycle. Finally, we developed a heuristic for identifying general networks and compared its performance to a well known algorithm. Future research in the area will focus on developing better heuristics for general networks and providing performance guarantees.
Appendix A Proof of Lemma 1
Steps 5 satisfies Constraint 2. Constraints 3 and 4 are satisfied because the initial graph formed in Step 2 is a line, and the rest of the steps never create branches or rings. Constraints 5 and 6 are satisfied because each interaction in the interference graph, i.e. the matrix, is represented in one of the links in , and two tunnels that do not interfere are never assigned to the same link.
Appendix B Proof of Theorem 1
Lemma 1 shows that Algorithm 1 obtains a feasible solution. We need to establish the upper bound to prove the theorem. Step 1 of the algorithm adds edges to . The number of edges added by Step 3 in the worst case is upper bounded by because each tunnel can require a maximum of extra edges. Step 5 adds edges, and Step 8 can add a maximum of edges. Hence we get the required upper bound.
Appendix C Proof of Lemma 2
A clique in the minimum edge clique cover of a graph has at least one unique edge, i.e. an edge that is not a part of any other cliques. If this was not the case, then we can obtain a cover with fewer cliques simply by removing clique . Because each edge represents an interference, each unique edge must be assigned to a different link in .
If , then two unique edges of the interference graph have been assigned to the same edge of . This contains at least two tunnels that do not interfere with each other which violates the interference constraints in the ILP.
Appendix D Proof of Theorem 2
Given a graph with directed edges, we consider the problem of assigning the same tunnels in an undirected network. If every edge in the directed network is used by the tunnels in both direction, then . That is links become a single link . However, in the directed network, some of the links can be used only in one direction. Hence, . The result follows directly from Lemma 2.
Appendix E Proof of Lemma 3
First we show that if there exists tunnel pairs and that intersect at link and nowhere else, then . We know that provides a feasible solution to the ILP, hence from Theorem 2, . Also, the interference graph has a clique corresponding to each directed edge in G as long as some flows intersect in this link. It is sufficient to show that if the condition is satisfied then each clique in corresponds to a unique link .
Let be the clique corresponding to the directed link . has a link between the nodes and , and this link is not part of any other clique. Hence, must be a clique in the minimum edge clique cover. This shows that there is one to one correspondence between the cliques in the minimum edge clique cover of and the links of .
Next we show that if then there exist tunnel pairs and that intersect at link and nowhere else. Let be the set of all the tunnels that pass through at link . Note that must have at least two tunnels because if has less than two tunnels then there is no clique corresponding to link giving .
For contradiction, assume that every pair of tunnels also intersects at some other link . Now we can consider a set of cliques corresponding to every link in the network other than link and cover all the edges in the interference graph giving .
Appendix F Proof of Theorem 3
We know from Theorem 2 that . We also know that the original network provides a feasible solution to the ILP, so . Hence,
[TABLE]
By Lemma 3, when the condition in the theorem statement is satisfied . Hence by a sandwiching argument .
Appendix G Proof of Lemma 4
The proof is by contraction. Assume that interferes with the most number of other tunnels, but when all the siblings of are removed is not a leaf node. Because of this assumption, has at least one neighbor node such that is not a leaf node as shown in Figure 7. Since is a minimal tree the subtree of , formed by removing the link , has at least two leaf nodes and .
Consider a graph formed by removing the neighbors of node other than . In this graph, because of symmetry, a tunnel from to interferes with the same number of tunnels as . Clearly, in graph , the tunnel from to interferes with more tunnels because in addition to all the tunnels that the path from to interferes with it also interferes with the tunnel from to . This leads to a contradiction.
Appendix H Proof of Lemma 5
If and share the same parent, the path from to contains only two links and . None of the tunnels that don’t start in or end in use these links, hence no such tunnels intersect with the tunnel from to .
If and do not share the same parent, must be connected to a leaf node in the subgraph obtained by removing the link . Similarly must be connected to a leaf node in the subgraph obtained by removing the link . The tunnel connecting the nodes and intersects with the tunnel connecting and .
Appendix I Proof of Theorem 4
We will need one more lemma before proving the main theorem. This lemma simply uses Lemma 5 to show that Step 5 identifies the correct set of nodes.
Lemma 7**.**
Consider the set of nodes obtained in Step 5 of Algorithm 2. A leaf node is in if and only if shares the same parent as .
Proof.
By Lemma 5, node and pass the test of Algorithm 3 if and only if they share the same parent. Step 3 collects all the nodes that pass the test into and ignoring any node that doesn’t. Hence, we obtain the required set . ∎
I-A Proof of Theorem 4
By Lemmas 4 and 7 we can see that Steps 5 and 6 identify a group of sibling nodes such that removing them makes their parent a leaf node. Steps 7 and 8 produce a the matrix of a tree with the siblings of pruned. The new matrix so formed corresponds to the such a tree because interference of tunnels starting or ending on the node is exactly the same the tunnels starting or ending at node when its siblings are removed. Meanwhile, the pruned portion tree is recreated in at every iteration of Step 5. Hence, when all the nodes in G are removed, the complete graph is created in .
Appendix J Proof of Lemma 6
Let where . Assume that the correct ordering of the nodes in the ring is . We want to show that tunnels from node 1 to 2 and 1 to n intersect with fewer tunnels than any other tunnel that start at node 1.
We begin by showing that the tunnel intersects with fewer tunnels than tunnel . These tunnels share the links and . So any tunnel passing through these links intersect with both the tunnels. Also, because of symmetry, the number of tunnels intersecting with tunnel only at link is equal to the number of tunnels intersecting with tunnel only at link . The tunnel does not intersect with tunnel however it intersects with tunnel only at link . Hence tunnel intersects with at least one more link than tunnel . Clearly, any longer tunnel starting at node 1 must interfere with even more tunnels.
The ring has at least 5 nodes and the network is using the shortest path routing, so we can apply the same argument as above to show that to show that also intersects with the fewest number of tunnels among the tunnels starting at node 1 and passing through link . Since all tunnels that start at node has to pass through either or , these two tunnels must be the ones that intersect with the fewest other tunnels that start at node 1.
Because of symmetry this property holds for tunnels starting at every node in the network. This completes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Vardi, Y., 1996. Network tomography: Estimating source-destination traffic intensities from link data. Journal of the American statistical association, 91(433), pp.365-377.
- 2[2] Castro, R., Coates, M., Liang, G., Nowak, R., and Yu, B. Network tomography: Recent developments. Statistical science, 2004.
- 3[3] Spring, Neil, Ratul Mahajan, and David Wetherall. ”Measuring ISP topologies with Rocketfuel.” ACM SIGCOMM Computer Communication Review 32.4 (2002): 133-145.
- 4[4] B. Donnet, P. Raoult, T. Friedman and M. Crovella, ”Deployment of an Algorithm for Large-Scale Topology Discovery,” in IEEE Journal on Selected Areas in Communications, Dec. 2006.
- 5[5] Gunes, Mehmet Hadi, and Kamil Sarac. ”Resolving anonymous routers in internet topology measurement studies.” INFOCOM 2008. The 27th Conference on Computer Communications. IEEE. IEEE, 2008.
- 6[6] Rabbat, Michael, Robert Nowak, and Mark Coates. ”Multiple source, multiple destination network tomography.” INFOCOM 2004. Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies. Vol. 3. IEEE, 2004.
- 7[7] Mark Coates, Rui Castro, Robert Nowak, Manik Gadhiok, Ryan King, and Yolanda Tsang. 2002. Maximum likelihood network topology identification from edge-based unicast measurements. In Proceedings of the ACM SIGMETRICS, 2002.
- 8[8] Mark Coates, Michael Rabbat, and Robert Nowak. Merging logical topologies using end-to-end measurements. In Proceedings of the 3rd ACM SIGCOMM, 2003.
