Distributionally Robust Removal of Malicious Nodes from Networks
Sixie Yu, Yevgeniy Vorobeychik

TL;DR
This paper introduces a distributionally robust method for removing malicious nodes in networks, addressing uncertainty in maliciousness estimates, and demonstrates its effectiveness through theoretical and empirical analysis.
Contribution
It develops a novel distributionally robust framework for node removal, overcoming limitations of prior methods that assume accurate maliciousness probabilities.
Findings
The proposed algorithm is highly effective in practice.
It outperforms existing methods in robustness.
The approach is validated on synthetic and real data.
Abstract
An important problem in networked systems is detection and removal of suspected malicious nodes. A crucial consideration in such settings is the uncertainty endemic in detection, coupled with considerations of network connectivity, which impose indirect costs from mistakely removing benign nodes as well as failing to remove malicious nodes. A recent approach proposed to address this problem directly tackles these considerations, but has a significant limitation: it assumes that the decision maker has accurate knowledge of the joint maliciousness probability of the nodes on the network. This is clearly not the case in practice, where such a distribution is at best an estimate from limited evidence. To address this problem, we propose a distributionally robust framework for optimal node removal. While the problem is NP-Hard, we propose a principled algorithmic technique for solving it…
| density | #edges | clustering coeff. | ||
|---|---|---|---|---|
| BA-1 | 2.7167 | 0.0461 | 375 | 0.1340 |
| BA-2 | 2.2789 | 0.0610 | 496 | 0.1504 |
| BA-3 | 2.0374 | 0.0757 | 615 | 0.1646 |
| SW-1 | 0.0787 | 640 | 0.3664 | |
| SW-2 | 0.1102 | 896 | 0.3875 | |
| SW-3 | 0.1575 | 1280 | 0.4059 | |
| 0.0106 | 1325 | 0.3930 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacility Location and Emergency Management · Infrastructure Resilience and Vulnerability Analysis · Risk and Portfolio Optimization
marginparsep has been altered.
topmargin has been altered.
marginparwidth has been altered.
marginparpush has been altered.
The page layout violates the ICML style. Please do not change the page layout, or include packages like geometry, savetrees, or fullpage, which change it for you. We’re not able to reliably undo arbitrary changes to the style. Please remove the offending package(s), or layout-changing commands and try again.
Distributionally Robust Removal of Malicious Nodes from Networks
Anonymous Authors1
††footnotetext: 1Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country. Correspondence to: Anonymous Author [email protected].
Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute.
Abstract
An important problem in networked systems is detection and removal of suspected malicious nodes. A crucial consideration in such settings is the uncertainty endemic in detection, coupled with considerations of network connectivity, which impose indirect costs from mistakely removing benign nodes as well as failing to remove malicious nodes. A recent approach proposed to address this problem directly tackles these considerations, but has a significant limitation: it assumes that the decision maker has accurate knowledge of the joint maliciousness probability of the nodes on the network. This is clearly not the case in practice, where such a distribution is at best an estimate from limited evidence. To address this problem, we propose a distributionally robust framework for optimal node removal. While the problem is NP-Hard, we propose a principled algorithmic technique for solving it approximately based on duality combined with Semidefinite Programming relaxation. A combination of both theoretical and empirical analysis, the latter using both synthetic and real data, provide strong evidence that our algorithmic approach is highly effective and, in particular, is significantly more robust than the state of the art.
1 Introduction
One of the major problems in networked settings is to identify and remove potentially malicious nodes. For example, in social networks, malicious nodes may correspond to accounts created by malicious parties which spread social spam, hate speech, fake news, and the like, with considerable deliterious effects Allcott & Gentzkow (2017); Cheng et al. (2015). Major social network platforms consequently devote considerable efforts to identify and remove fake or malicious accounts Rodriguez (2018); Scott & Isaac (2017). Nevertheless, evidence suggests that the problem remains pervasive Andrade (2018); Narayanan et al. (2018). Similarly, in cyber-physical systems (e.g., smart grid infrastructure), computing nodes compromised by malware can cause catastrophic losses, and mitigation through detection and removal of such malicious nodes is a major problem Mo et al. (2012); Yang et al. (2017).
A central challenge faced in deciding which potentially malicious nodes to remove is to account for the combination of uncertainty about whether particular nodes are malicious, and the indirect (network) effects of the decision. This combination makes the decision about which nodes to remove fundamentally a subset selection problem—a challenging combinatorial optimization problem. Recently, Yu & Vorobeychik proposed an approach for solving it they term MINT, where the problem is captured by approximately minimizing loss which involves three terms: direct loss from removing benign nodes, indirect loss from cutting links in the benign subgraph, and indirect loss from maintaining connectivity between malicious and benign nodes.
This model is illustrated in Fig. 1, where we consider removing Jack and Emma, two benign nodes above the dotted blue line (and failing to remove the malicious node). Suppose that we pay a penalty of for each benign node we remove, a penalty for each link we cut between benign nodes, and for each link between remaining malicious nodes and benign nodes. Since we remove benign nodes, cut links between benign nodes (one between Emma and Rachel, one between Emma and Ryan, and another between Jack and Ryan), and the malicious node is still connected to nodes (Rachel and Nancy), our total loss is: .
A major shortcoming of MINT is that it assumes that the distribution of node maliciousness is known. In practice, such a distribution is estimated from limited evidence, such as node behavior and other characteristics, and this estimation may be quite inaccurate (particularly if our modeling assumptions are poor, for example, if we erroneously assume that maliciousness probabilities of nodes are independent). More precisely, consider an unknown ground-truth , as illustrated in Fig. 1 in green. Whereas MINT assumes we know , in reality we only have an estimate (shown in red in Fig. 1). To address this issue, we propose a new approach, MINT_DRO, which is a distributionally robust framework for optimal node removal. We design an uncertainty set around the estimate and optimize with respect to the worst-case scenario. We propose a principled algorithmic approach for solving this problem approximately based on duality combined with Semidefinite Programming relaxation, and prove that the uncertainty set in our model contains the ground-truth distribution with high probability. This in turn implies that with high probability MINT_DRO is robust with respect to the ground-truth distribution. Finally, we conducted extensive experiments using both synthetic and real data to show that our model is significantly more robust than MINT.
Related Work
There are several prior efforts considering a related problem of graph scan statistics and hypothesis testing Arias-Castro et al. (2011); Priebe et al. (2005); Sharpnack et al. (2013). These study the following problem: given a graph where each node is associated with a random variable with an exogenously specified probability distribution, find a subset of nodes that maximizes a scan statistic defined over subsets of nodes (for example, this statistic may generalize log-likelihood ratio). The recent MINT approach Yu & Vorobeychik (2018) can be viewed through this lens as well, but as it has been shown to have state-of-the-art performance, our comparison, our experimental evaluation focuses on comparing to MINT.
Also closely related to our problem is the broader literature on distributionally robust optimization (DRO) Scarf (1958). In the DRO framework one defines a set of probability distributions that is assumed to contain the true stochastic model of the problem. Many solutions have been proposed to solve specific problems under the DRO framework Xu & Mannor (2010); Calafiore & El Ghaoui (2006); Yue et al. (2006); Cheng et al. (2014); Wiesemann et al. (2014), although this framework has not been applied in the context of choosing which potentially malicious nodes to remove from a network.
Our design of the uncertainty set is inspired by the idea of moment-constrained uncertainty set Delage & Ye (2010); Popescu (2007); Calafiore & El Ghaoui (2006). Yet another related research strand is in using Semidefinite Programming (SDP) to approximate combinatorial optimization problems Goemans & Williamson (1995); Luo et al. (2010); Bertsimas & Sethuraman (2000), although such approaches are domain specific. Finally, our work bears some relationship to the burgeoning field of adversarial machine learning Vorobeychik & Kantarcioglu (2018), although we do not explicitly consider issues of adversarial response (such as evasion attacks) in our setting.
2 Model
We consider a network that is represented by a graph , where () is the set of nodes and the set of edges connecting them. Each node represents a user and each edge represents an edge (e.g., friendship on Facebook) between user and user . We focus our attention on undirected graphs. We denote the adjacency matrix of by . The elements of are binary if the graph is unweighted, or some non-negative real numbers if the graph is weighted. To make expositioin easier we focus on unweighted graphs. Generalization to weighted graphs is straightforward.
We consider the problem of removing malicious nodes from the network . A configuration of the network is denoted by , with indicating that a node is malicious, with when is benign. For convenience, we also let to indicate that is benign. Consequently, (and ) assigns malicious or benign label to each node. The identity of malicious and benign nodes are usually uncertain. So instead we have a probability distribution over the configurations. Formally, let , where captures the joint probability distribution over node configurations.
Our work builds upon the following model proposed by Yu & Vorobeychik (2018). Let denote the set of nodes to remove. Define a vector , where if and only if node is removed (), and if node remains in the network (). The goal of their model is to identify a subset of nodes to remove so as to minimize the impact of the remaining malicious nodes on the network, while at the same time minimizing disruptions caused to the benign subnetwork. This goal is naturally captured by the loss function given in Eq. (1).
[TABLE]
As we can observe, the loss function is composed of three components. The first component, , of the loss function is the direct loss associated with removing benign nodes. The second component, , penalizes cutting connections between benign nodes that are removed and benign nodes that remain; in other words, it penalizes the degradation of connectivity within the benign subgraph. The third component of the loss function, , captures the consequence of failing to remove malicious nodes in terms of connections from these to benign nodes. The nonnegative trade-off parameters , , and satisfy , and weigh the relative importance of the three components of the loss function.
The configuration is a random variable distributed according to . Let and denote its mean and covariance, respectively. The loss function defined in Eq. (1) depends on both and . To make the dependency explicit we define several matrices and re-write the loss function in a matrix-vector form. We define the matrices as follow.111 returns a diagonal matrix with diagonal elements equal to .
[TABLE]
Note that the elements of these matrices are not constant, but depend on and (see the appendix for their detailed dependency).
Slightly abusing notation, we define two additional matrices, and . Note that is a symmetric matrix:
[TABLE]
and . We can now rewrite the loss function in a compact matrix-vector form as the following:
[TABLE]
Optimizing the loss function above (as done by Yu & Vorobeychik) critically assumes that the maliciousness distribution is known. In reality, this is typically not the case, and such a distribution is estimated from data. Let denote the estimated distribution. The mean of is denoted by , where is the estimated probability that node is malicious given its features from past data. Similarly, the estimated covariance matrix is represented by . The model proposed by Yu & Vorobeychik is called MINT, which is to solve the following optimization problem:
[TABLE]
Although MINT has been shown to perform well on several real-world datasets, its performance is strongly influenced by the estimation error of . In fact, in Section 5 we show that even a small estimation error can severely undermine the performance of MINT.
In order to mitigate the sensitivity of MINT to estimation error, we propose a novel Distributionally Robust Optimization (DRO) approach for solving the problem posed above. The general idea is to design a distributional set to capture the uncertainty about the estimated mean and make decisions considering the worst-case scenario. Specifically, we propose a model named MINT_DRO, which aims to solve the following optimization problem:
[TABLE]
where the set captures uncertainty about the true mean . There are several fundamental differences between MINT_DRO and MINT. First, there is an additional inner maximization problem in MINT_DRO. The inner maximization is optimized over a set , which contains a set of probability distributions, where is any distribution sampled from , and are random variables distributed according to . Inspired by Delage & Ye (2010) and Cheng et al. (2014), we parametrize the set by the first and second moments of the distributions in it. Specifically, let be any distribution in . Consider the following two constraints:
[TABLE]
where and are the mean and covariance matrix estimated from data. are random variables distributed according to . The first constraint defines an ellipsoid, which indicates that the expectation of lies in the ellipsoid centered at the estimate . The size of this ellipsoid is determined by , which provides a natural measure to quantify our uncertainty about given . Note that the second constraint also defines the support of the distribution . The second constraint enforces the covariance matrix of to lie in a positive semi-definite cone. Intuitively, the second constraint captures how likely it is that the random variable is close to . The set is then characterized by Eq. (3):
[TABLE]
The set is always non-empty, since it must contain the distribution . In Section 4 we provide probabilistic arguments to show that contains ground-truth distribution with high probability, which guarantees that with high probability our model MINT_DRO is robust with respect to the ground-truth distribution . The choice of the two parameters and is important for the robustness of MINT_DRO. If their values are too small the benefit from the distributionally robust formulation is limited. In the extreme case where and are zeros our model MINT_DRO reverts to MINT. On the other hand if their values are too large, our model would make excessively conservative decisions. In Section 4 we show how to make sensible choice of these values.
3 Solution Approach
In this section we derive the algorithm to solve our model MINT_DRO. The optimization problem of MINT_DRO is a binary quadratic program, which is diffcult to solve even if the loss function is convex. Additionally, in our problem the loss function is nonconvex since the matrix is usually not positive (semi)-definite, further complicating the situation. Indeed, given that MINT, which was shown by Yu & Vorobeychik to be NP-Hard, is a special case, the following result is immediate.
Theorem 1**.**
Solving MINT_DRO is NP-Hard.
In what follows, we derive an approximation approach for solving MINT_DRO. We first apply duality to transform the inner maximization into a minimization problem, which can be jointly minimized with the outer minimization over . At this stage the optimization problem is still a NP-hard combinatorial optimization problem. Next, we apply Semidefinite Programming (SDP) to obtain a convex relaxation of our problem which can be solved efficiently.
The support of the distributions in is , which is defined as \mathcal{S}:=\big{\{}\mathbf{\mu}_{\mathcal{F}}\,\big{|}\,(\mathbf{\mu}_{\mathcal{F}}-\hat{\mathbf{\mu}})^{T}\hat{\mathbf{\Sigma}}^{-1}(\mathbf{\mu}_{\mathcal{F}}-\hat{\mathbf{\mu}})\leq\gamma_{1}\big{\}}, where the subscript of indexes the distribution associated with this random variable. Note that is sufficient for the first constraint in Eq. (2) to be true, since is a convex combination of the instantiations of and is a convex set. We rewrite the inner maximization problem as Eq. (4):
[TABLE]
The constraint Eq.(4b) ensures that is a valid probability distribution. The constraints Eq.(4c) guarantee that is in . The constraint Eq. (4d) ensures that any random variable must reside in . Consequently, this constraint is actually an infinite dimensional constraint on the optimizer . Later we introduce a technique called S-Lemma to convert it to a finite dimensional constraint. We derive the lagrange function of Eq. (4), where we temporily omit constraint Eq. (4d), and pull the terms that are independent of out of the integral:
[TABLE]
where , and is a real symmetric positive semi-definite matrix, and returns the trace of the matrix . where holds, since otherwise the solution to Eq.(4) is unbounded.
By duality (Shapiro, 2001; Delage & Ye, 2010; Cheng et al., 2014), the dual problem of Eq. (4) is formulated as the following minimization problem:
[TABLE]
where is the positive semi-definite cone. Strong duality holds between Eq. (5) and the original inner maximization problem. This is because for any and , the estimated distribution is always in the relative interior of . Consequently, by Proposition 3.4 in Shapiro (2001) strong duality holds. Since Eq. (5) is a minimization problem, we can jointly minimize it with the outer minimization over , which results in the following:
[TABLE]
where constraint Eq. (6b) is equivalent to . We write it this way in order to emphasize its quadratic form. Constraints Eq.(6b) and (6c) are infinite dimensional constraints. We apply a technique called S-Lemma to transform them to finite dimensional constraints. We first introduce the S-Lemma:
Lemma 3.1** (S-Lemma Boyd & Vandenberghe (2004)).**
Let , , , where is the subspace of symmetrix matrices in . Suppose there exists an such that: . Then the following implication holds for any :
[TABLE]
*if and only if, . *
Note that S-Lemma only requires and to be real symmetric matrices. In order to apply S-Lemma we need to have two quadratic functions. Constraint Eq. (6b) is a quadratic function in . Thus, what remains is to convert Eq. (6c) to a quadratic function in . Recall that the term, in , is implicitly a quadratic function of . We re-formulate and according to , which results in Eq.(7) (see the Appendix for details about this reformulation):
[TABLE]
where returns a diagonal matrix with diagonal elements equal to . We substitute Eq. (7) back to , which results in the following equivalence:
[TABLE]
where:
[TABLE]
which results in a compact form of :
[TABLE]
Note that for any the inequality in constraint Eq. (6b) is strict when . Consequently, by S-Lemma, for any the implication, , is equivalent to Eq.(8):
[TABLE]
The two infinite dimensional constraints Eq.(6b) and (6c) are thereby converted into a finite dimensional constraint Eq. (8). Additionally, the objective function in Eq. (6) is linear in its optimizer.
The last issue is that we still have two sources of non-convexity in Eq. (6): first, is binary, and second, the constraint represented by Eq. (8) is not convex in because of three terms involving in , and :
[TABLE]
To deal with the first issues, we relax the feasible region of to . To address the second, we next apply SDP relaxation to transform Eq. (6) into a convex optimization problem.
First, let us introduce a matrix . Then the following three relationships hold (see the Appendix for detailed proof):
[TABLE]
One problem is that the feasible regions involving and are nonconvex because of the equality . In order to transform the feasible regions to be convex, we apply a two-step relaxation. The first step is to relax the equality and enforce the diagonal elements of equal to one, which results in: and . This step transforms the feasible region of to a positive semi-definite cone, which is a convex set. However, we still have a nonconvex term . To handle this, in the second step we apply Schur Complement to transform to the linear matrix inequality: . Combining the relationships in Eq. (10) with the results of the two-step relaxation above, the three nonconvex terms in Eq. (9) can be represented as the following convex set:
[TABLE]
With a slight abuse of notation, the operator in extracts the diagonal elements of as a column vector. Finally, by substituting , and to the corresponding matrices in Eq. (8) we obtain the following Semidefinite Program which approximately solves MINT_DRO (after we project the optimal solution of this problem into , for example, by rounding):
[TABLE]
4 Theoretical Analysis
In this section we present a probabilistic argument that the uncertainty set defined in Eq. (3) contains the ground-truth distribution with high probability. This, in turn, implies that with high probability our model MINT_DRO is robust with respect to the unknown ground-truth distribution.
We show that the ground-truth distribution belongs to with high probability in two steps, arguing first that (C1) and, subsequently, that (C2) below hold with high probability, where (C1) and (C2) are defined as follows:
[TABLE]
[TABLE]
The arguments in the first step are based on Lemma 4.1. For space limitation we defer its proof to the appendix.
Lemma 4.1**.**
Let and denote the mean and covariance matrix of the ground-truth distribution , and suppose that is estimated from samples, , where is bounded: . Then satisfies the following constraint with probability at least :
[TABLE]
where \beta(\delta_{1})=\frac{R^{2}}{M}\bigg{(}2+\sqrt{2\log{\frac{1}{\delta_{1}}}}\bigg{)}^{2}.
We assume the estimated covariance matrix is close to . Then, if we let and note that , a direct application of Lemma 4.1 implies that (C1) holds with probability at least .
The arguments in the second step rely on the result due to Delage & Ye (2010):
Lemma 4.2** (Delage & Ye (2010)).**
Suppose that is distributed according to , and the mean of the distribution is known and used to formulate the estimated covariance matrix , which is estimated from samples: \hat{\mathbf{\Sigma}}=(1/M)\sum_{i=1}^{M}{\big{(}\zeta_{i}-\mathbf{\mu}\big{)}\big{(}\zeta_{i}-\mathbf{\mu}\big{)}^{T}}, where is bounded: . Then with probability at least :
[TABLE]
where \alpha(\delta_{2})=(R^{2}/\sqrt{M})\bigg{(}\sqrt{1-N/R^{4}}+\sqrt{\log{1/\delta_{2}}}\bigg{)}, M>R^{4}\bigg{(}\sqrt{1-N/R^{4}}+\sqrt{\log{1/\delta_{2}}}\bigg{)}^{2} and is the dimensions of .
In order to use Lemma 4.2 we assume that the estimated mean is close to the ground-truth . Given this assumption, showing that (C2) holds with high probability is equivalent to show that the following holds with high probability:
[TABLE]
by Lemma 4.2, the above is true with high probability when: . Consequently, by setting , such that the effects of are negligible, we conclude that (C2) holds with probability at least .
Finally, by a union bound we obtain probabilistic guarantees that the uncertainty set contains .
Theorem 2**.**
With probability at least , where , the uncertainty set defined in Eq. (3) contains the ground-truth distribution .
Proof.
The detailed proof is deferred to the appendix. ∎
We now demonstrate how to utilize the probabilistic arguments to make sensible choice for . The value of can be similarly obtained. Note that is necessary for (C1) to hold. Consider a network with nodes. Assume is diagonal with diagonal elements equal to , which is reasonable when a single estimator is used to estimate and the maliciousness probabilities of nodes are independent. A reasonable estimate of is , which is the radius of the circumcircle sphere of a hypercube with length of side equal to one. If and , then . Therefore in order for to contain with probability , we need . Similarly, for a network with nodes, we want .
5 Experiments
In this section we present experimental results to show the effectiveness of our approach. Our experiments were conducted on both synthetic and real-world network structures, although in all cases the distribution over maliciousness of nodes was derived using real data. We considered two types of network generative models to construct synthetic networks: Barabasi-Albert (BA) Barabási & Albert (1999) and Watts-Strogatz networks (Small-World) Watts & Strogatz (1998). BA is characterized by its power-law degree distribution, where the probability that a randomly selected node has neighbors is proportional to . For the BA model we experimented with three variants, BA-1, BA-2, and BA-3, which differ in the value of the exponent of their power-law degree distributions. For Small-World networks we also experimented with three variants, SW-1, SW-2, and SW-3, that have different local clustering coefficients. For both networks we generated instances with nodes. For real-world networks, we used a network extracted from Facebook data Leskovec & Mcauley (2012) which consisted of nodes and edges. We experimented with randomly sampled sub-networks with nodes. For space limitation the statistics of the networks used in our experiments are listed in the appendix.
For fair comparison with MINT (the state-of-the-art alternative), we used the same experimental setup as Yu & Vorobeychik (2018). In all of our experiments, we derived the ground-truth distribution as follow. We start with a dataset which includes malicious and benign instances (the meaning of these designations is domain specific). The dataset is partitioned into three subsets: , and , with the ratio of . Our first step is to learn a probabilistic predictor of maliciousness as a function of a feature vector , , on . Then we randomly assign malicious and benign feature vectors from to the nodes on the network, assigning of nodes with malicious features and with benign feature vectors. For each node we use its assigned feature vector to obtain our estimated probability of this node being malicious, ; This gives us the estimated maliciousness probability distribution . This is the distribution used to solve the model MINT, and also the distribution used to construct the uncertainty set in our model. To ensure that our evaluation reasonably reflects realistic limitations of the knowledge about the ground-truth distribution , we train another predictor usign . Applying this new predictor to the nodes and their assigned feature vectors, we obtain a distribution which we use to evaluate effectiveness.
We conducted three sets of experiments. In the first set of experiments we used synthetic networks and used data from the Spam Cormack et al. (2008) dataset To simulate estimation error of , we add white Gaussian noise to the evaluation distribution . The standard deviation of the noise is increased from to to simulate different magnitudes of the estimation error.
In the second set of experiments we used real-world networks from Facebook and used Hate Speech data Davidson et al. (2017) collected from Twitter to obtain as discussed above. We categorized this dataset into two classes in terms of whether a tweet represents Hate Speech. After categorization, the total number of tweets is , of which are Hate Speech. We add white Gaussian noise to to simulate estimation error as discussed above. Note that in this set of experiments we used real data for both the networks and the maliciousness probabilities .
In the third set of experiments we considered the scenario that instead of being random, the location of the malicious nodes on the network is strategically determined. This scenario is not vacuous: in reality, for example, the nodes that have high degrees (e.g., celebrities with lots of followers on Twitter) may be targeted in order to maximize the influence of commercial advertisements Kempe et al. (2003). We conducted this set of experiments on synthetic networks. A set of nodes is greedily selected from the network to maximize the number of unique neighbors connecting to them. Then we assign malicious feature vectors to these nodes.
Experiment Results
We compared our model with a state-of-the-art approach MINT. The average losses for our first set of experiments where was simulated from Spam data are shown in Figures 2 and 3. The experimental results on BA are showed in Figure 2, with the three columns corresponding to BA-1, BA-2 and BA-3, respectively. The experimental results on Small-World are shown in Figure 3, where the three columns correspond to SW-1, SW-2, and SW-3. In both figures, each row corresponds to a combination of trade-off parameters ; for example, corresponds to . Each bar was obtained by averaging over randomly generated network topologies.
The experimental results indicate that on both BA and Small-World networks our model MINT_DRO is significantly more robust than MINT. Additionally, when no noise is added to the evaluation distribution (left-most bars in all subplots), MINT_DRO is more robust than MINT except for a few cases. this indicates that the generalization ability of MINT_DRO is better than MINT.
The average loss on Facebook data is showed in Figure 4, with the three columns corresponding to , , and . In this experiment, both the networks and the data used to simulate maliciousness probabilities are real data. Each bar was averaged over randomly sampled networks. Our model MITN_DRO is significantly more robust than MINT except for the cases where no noise is added. In this case, MINT_DRO is only worse than MINT at the left-most bars in the middle figure, although the difference is not significant. This is actually expected since MINT_DRO considers the worst-case scenario, which results in a decision that may be slightly conservative in no noise setting. One observation is that the Facebook networks used in this experiment are dramatically different from the simulated networks in terms of graph statistics (see the appendix for the detailed statistics). Particularly, the Facebook networks are disconnected, highly sparse, and have approximately nodes that have zero degree. Therefore the robustness exhibited in Figure 4 provides strong evidence to the effectiveness of MINT_DRO.
The average loss on the third set of experiments are shown in Figures 5 and 6 for BA and Small-World networks, respectively. For both figures the three columns correspond to , , and . The results show that MINT_DRO is more robust than MINT across all settings. Recall that the loss function of MINT and MINT_DRO depends on the estimated covariance matrix , which encodes correlation information of the distribution . When the actual maliciousness of nodes become correlated as we simulated in this experiment, the performance of MINT degrades since it is using the estimated distribution which now significantly deviates from the true distribution. When and are appropriately selected, contains the distribution that characterizes the strategic correlation simulated in this experiment, resulting in significantly better robustness.
One may argue that instead of resulting from the robustness against correlation in the maliciousness distribution that comes from strategic decision about where to place the malicious nodes, the robustness exhibited in Figures 5 and 6 stems solely from the fact that MINT_DRO is more robust than MINT when no noise is added to . However, consider the left-most bars in the lower-left subplot of Figure 2. In this setting MINT_DRO performs worse than MINT. Now, consider another setting where the experimetal setup is identical except that the malicious nodes are strategically chosen. This setting corresponds to the left-most bars in the right subplot of Figure 5 where MINT_DRO performs better than MINT. Similar observations can be found on Small-World networks. Consequently, we can see that a major advantage of MINT_DRO is in its robustness even when the location of the malicious nodes on the graph is itself chosen strategically.
6 Conclusion
We considered the problem of removing malicious nodes from a network under uncertainty. We designed a model that considers the uncertainty around the estimated maliciousness probabilities, and makes decision under the worst-case scenario. We then proposed a principled algorithmic technique for solving it approximately based on duality combined with Semidefinite Programming relaxation. We theoretically proved that our model is robust with respect to the ground-truth, and experimentally showed that our model is more robust than the state of the art.
Appendix
1 Proof of Lemma 4.1
The proof is a generalization of a result proved by Shawe-Taylor & Cristianini. For completeness we list their result in Lemma 1.1.
Lemma 1.1**.**
Shawe-Taylor & Cristianini (2003*)**
Assume is a random variable satisfying:*
[TABLE]
where the last inequality bounds the support of . Let be a set of independently and ramdomly sampled instances of . Then with probability at least , the following inequality holds:
[TABLE]
In what follows we prove Lemma 4.1:
Lemma 4.1**.**
Let and denote the mean and covariance matrix of the ground-truth distribution , and suppose that is estimated from samples, , where is bounded: . Then satisfies the following constraint with probability at least :
[TABLE]
where \beta(\delta_{1})=\frac{R^{2}}{M}\bigg{(}2+\sqrt{2\log{\frac{1}{\delta_{1}}}}\bigg{)}^{2}.
Proof.
Apply a standadization to the , which results in a new random variable . It is clear that satisfies Lemma 1.1. Let \beta(\delta_{1})=\frac{R^{2}}{M}\bigg{(}2+\sqrt{2\log{\frac{1}{\delta_{1}}}}\bigg{)}^{2}, then we have:
[TABLE]
∎
2 Proof of Theorem 2
Theorem 2**.**
With probability at least , where , the uncertainty set contains the ground-truth distribution .
Proof.
We define two events and as follow:
[TABLE]
Then we have:
[TABLE]
where is the event that . In other words, \mathbb{P}\big{(}\mathcal{P}\in\Pi\big{)}\geq 1-\delta, which completes the proof. ∎
3 Detailed dependency of on their arguments
In the following we expand the definition of , which makes their dependency on and clear:
[TABLE]
4 Detailed forms of the matrices and
The matrices and defined in the paper have the following forms:
[TABLE]
5 Detailed reformulation of Eq. (7)
In the paper in order to apply the S-Lemma to convert the two infinite dimensional constraints, Eq. (6b) and Eq. (6c), to a finite dimensional constraint, we need two functions in quadratic forms. Notice that Eq. (6b) is already a quadratic function in . So what remains is to convert Eq. (6c) to a quadratic function in . We first convert the following to a quadratic function in :
[TABLE]
From last section we know:
[TABLE]
The three terms
1
,
2
and
3
, together with , form three quadratic functions in . In what follows, we convert them to quadratic functions in . Note that the operator returns a diagonal matrix with diagonal elements equal to :
[TABLE]
where comes from the fact that . Similarly we have:
[TABLE]
and:
[TABLE]
where comes from the following:
[TABLE]
Putting the above derivation together we obtain:
[TABLE]
So the function becomes:
[TABLE]
which is a quadratic function in . Define and as the following:
[TABLE]
which results in a compact form of :
[TABLE]
6 Proof of Eq.(10) in the paper
The relation is direct. To see why holds, note that the -th element of is:
[TABLE]
which is equal to the -th element of :
[TABLE]
The relation holds because:
[TABLE]
7 Statistics of the networks used in experiments
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Allcott & Gentzkow (2017) Allcott, H. and Gentzkow, M. Social media and fake news in the 2016 election. Journal of Economic Perspectives , 31(2):211–36, 2017.
- 2Andrade (2018) Andrade, V. Facebook, whatsapp step up efforts in brazil’s fake news battle. Bloomberg , 2018. URL https://www.bloomberg.com/news/articles/2018-10-23/facebook-whatsapp-step-up-efforts-in-brazil-s-fake-news-battle .
- 3Arias-Castro et al. (2011) Arias-Castro, E., Candes, E. J., and Durand, A. Detection of an anomalous cluster in a network. The Annals of Statistics , pp. 278–304, 2011.
- 4Barabási & Albert (1999) Barabási, A.-L. and Albert, R. Emergence of scaling in random networks. science , 286(5439):509–512, 1999.
- 5Bertsimas & Sethuraman (2000) Bertsimas, D. and Sethuraman, J. Moment problems and semidefinite optimization. In Handbook of semidefinite programming , pp. 469–509. Springer, 2000.
- 6Boyd & Vandenberghe (2004) Boyd, S. and Vandenberghe, L. Convex optimization . Cambridge university press, 2004.
- 7Calafiore & El Ghaoui (2006) Calafiore, G. C. and El Ghaoui, L. On distributionally robust chance-constrained linear programs. Journal of Optimization Theory and Applications , 130(1):1–22, 2006.
- 8Cheng et al. (2014) Cheng, J., Delage, E., and Lisser, A. Distributionally robust stochastic knapsack problem. SIAM Journal on Optimization , 24(3):1485–1506, 2014.
