Sample Complexity Bounds for Influence Maximization
Gal Sadeh, Edith Cohen, Haim Kaplan

TL;DR
This paper establishes new, tighter bounds on the number of simulations needed to effectively identify influential nodes in networks under stochastic diffusion models, improving efficiency for influence maximization tasks.
Contribution
It introduces a novel upper bound on sample complexity for influence maximization, applicable to a broad class of models including IC and LT, and offers a data-adaptive method for fewer simulations.
Findings
Significantly improves sample complexity bounds for influence maximization.
Provides a data-adaptive method reducing the number of required simulations.
Develops an efficient greedy algorithm for approximate maximization.
Abstract
Influence maximization (IM) is the problem of finding for a given a set of nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a -approximate maximizer with confidence . Our main result is a surprising upper bound of for a broad class of models that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Sample Complexity Bounds for Influence Maximization
Gal Sadeh, Tel Aviv University,[email protected]
Edith Cohen, Google Research and Tel Aviv University, [email protected]
Haim Kaplan, Google Research and Tel Aviv University, [email protected]
Abstract
Influence maximization (IM) is the problem of finding for a given a set of nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [29]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a -approximate maximizer with confidence . Our main result is a surprising upper bound of for a broad class of models that includes IC and LT models and their mixtures, where is the number of nodes and is the number of diffusion steps. Generally , so this significantly improves over the generic upper bound of . Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an -approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds.
1 Introduction
Models for the spread of information among networked entities were studied for decades in sociology and economics [24, 18, 27]. A diffusion process is initiated from a seed set of nodes (entities) and progresses in steps: Initially, only the seed nodes are activated. In each step additional nodes may become active based on the current set of active nodes. The progression can be deterministic or stochastic. The -stepped influence of a seed set of nodes is then defined as its expected reachability (total number of active nodes) in steps.
Influence maximization (IM) is the problem of finding a set of nodes of specified cardinality and maximum influence. The IM problem was formulated nearly two decades ago by Richardson and Domingos [16, 38] and inspired by the application of viral marketing. In a seminal paper, Kempe, Klienberg, and Tardos [29] studied stochastic diffusion models and introduced two elegant special cases, the Independent Cascade (IC) and Generalized Threshold (GT) diffusion models. Their work sparked extensive followup research and large scale implementations [33, 7, 28, 37]. Currently IM is applied in multiple domains with linked entities for tasks as varied as diversity-maximization (the most representative subset of the population) and sensor placement that maximize coverage [30, 4, 32].
We consider stochastic diffusion models (SDM) over nodes that are specified by a distribution over sets of monotone non-decreasing boolean activation functions
[TABLE]
A diffusion process starts with a seed set of nodes and . At step [math] we activate the seed nodes . The diffusion then proceeds deterministically: At step all active nodes remain active and we activate any inactive node where :
[TABLE]
The -steps reachability set of a seed set is the random variable for and respectively the -steps reachability, , is the random variable that is the number of active nodes for . Finally, the influence value of is defined to be the expectation
[TABLE]
We refer to the case where the diffusion is allowed to progress until there is no growth as unrestricted diffusion and this corresponds to . The influence is a monotone set function. We say that an SDM is submodular when the influence function is submodular and that it is independent if the activation functions of different nodes are independent random variables. The IM problem for seed set size and steps is to find
[TABLE]
The reader might be more familiar with well-studied special cases of this general formulation. Live-edge diffusion models are specified by a graph with nodes and directed edges and a distribution over subsets of "live" edges. When expressed as an SDM, the activation functions that correspond to have if and only if there is an edge from a node in to in the graph . Live-edge models are always submodular: This because , which
is the number of nodes reachable from in by paths of length at most , is a coverage function and hence monotone and submodular. Therefore, so is the influence function , which is an expectation of a distribution over coverage functions. A live-edge model is independent if we only have dependencies between incoming edges to the same node. The Independent Cascade (IC) model is the special case of an independent live-edge model where all edges are independent Bernoulli random variables selected with probabilities ().
Another well-studied class are generalized threshold (GT) models [29, 33]. A GT model is specified by a set of monotone functions . The randomization is specified by a set of threshold values where . The corresponding activation functions to are
[TABLE]
A well-studied subclass are Independent GT (IGT) where we require that are submodular and nodes have independent threshold values . Mossel and Roch [33, 34] proved that IGT models are submodular, which is surprising because the functions are generally not submodular. Their proof was provided for unrestricted diffusion but extends to the case where we stop the process after steps. Finally, Linear threshold (LT) models [24, 29] are a special case of IGT where we have an underlying directed graph and each edge is associated with a fixed weight value so that for all and the functions are defined as the sums . Kempe et al showed [29] that each LT model is equivalent to an independent live-edge model.
One of the challenges brought on by the IM formulation is computational efficiency. Kempe et al [29] noted that the IM problem generalizes the classic Max Cover problem even with and a live-edge model with a fixed set of live edges ( for all ). Therefore, IM inherits Max Cover’s hardness of approximation for ratio better than [19] for a cover with sets. On the positive, with submodular models, an approximation ratio of can be achieved by the first nodes of a greedy sequence generated by sequentially adding a node with maximum marginal value [35]. A challenge of applying Greedy with stochastic models, however, is that even point-wise evaluation of the influence function can be computationally intensive. Exact evaluation even for IC models is #P hard [6]. As for approximation, Kempe et al proposed to work with averaging oracles
[TABLE]
that average the reachability values obtained from a set of i.i.d. simulations. Recall that in the general SDM formulation, a simulation is specified by a set of node activation functions. For live-edge models, a simulation is simply a set of concurrently live edges . In GT models, a simulation is specified by a set of thresholds .
The averaging oracle has some appealing properties: First, it is robust compared to estimators tailored to models that satisfy specific assumptions (see related work section) in that for any diffusion model , also with complex and unknown dependencies (between activation functions of different nodes or between edges in live-edge models), for any set , is an unbiased estimate of the exact influence value and estimates are accurate as long as the variance of is "sufficiently" small. Second, in terms of practicality, the oracle is directly available from simulations and does not require learning or inferring the underlying diffusion model that generated the data [39, 22, 21]. Therefore, the results are not sensitive to modeling assumptions and learning accuracy [8, 25]. Often, estimation of model parameters requires a large number of simulations: Even for IC models, Example 1.2 shows that edges with tiny probabilities that require many simulations to estimate can be critical for IM accuracy. Third, in terms of computation, when the reachability functions are monotone and submodular (as is the case with live-edge models), so is their average , and hence the oracle optimum can be approximated by the greedy algorithm. Prior work addressed the efficiency of working with averaging oracles by improving the efficiency of greedy maximization [30, 23] and applied sketches [9] to efficiently estimate values [7, 13].
The fundamental question we study here is the sample complexity of IM, that is, the number of i.i.d. simulations needed to recover an approximate maximizer of the influence function . Formally, for parameters , identify a seed set of size so that , where is the exact maximum. Note that the recovery itself is generally computationally hard and the sample complexity only considers the information we can glean from a set of simulations.
Kempe et al provided an upper bound of
[TABLE]
on the sample complexity of the harder Uniform Relative-Error Estimation (UREE) problem where for a given we bound the number of simulations so that with probability , for all subsets such that , approximates within relative error of . The sample complexity of UREE upper bounds that of IM because the oracle maximizer must be an approximate maximizer. We provide the argument for (1) here because it is basic and broadly applies to all SDMs: The reachability values , and hence their expectation, have values in . Using the multiplicative Chernoff bound (with values divided by ) we obtain that simulations guarantee a relative error of with probability at least for the estimate of any particular set . Interestingly, this bound is tight for point-wise estimation even for IC models: Example 1.1 shows a family of models where and simulations are required for estimating the influence value of a single node. The UREE sample complexity bound (1) follows from applying a union bound over all subsets.
The generic upper bound has prohibitive linear dependence on the number of nodes (that Example 1.1 shows is unavoidable for UREE even for IC models). A simple example shows that we can not hope for an umbrella improvement for IM: Consider the star graph family of Example 1.2 when edges are dependent so that either all edges are live or none is. Clearly simulations are necessary to detect a 1-step approximate maximizer (which must be the actual maximizer). The remaining hope is that we can obtain stronger bounds on the IM sample complexity for models with weaker or no dependencies such as the IC and IGT models. This question eluded researchers for nearly two decades.
Contributions and overview
We study the sample complexity of influence maximization from averaging oracles computed from i.i.d. simulations. One of our main contributions is an upper bound of
[TABLE]
on the IM sample complexity of independent strongly submodular SDMs. Informally, strong submodularity means that the influence function of any “reduced” model (model derived from original one by setting a subset of nodes as active) is submodular. The IC and IGT models are special cases of strongly submodular independent SDMs.
Interestingly, we provide similar sample complexity bounds for natural families of models that are not independent: Mixtures of small number of strongly submodular SDMs and what we call -dependence live-edge models that allow for positive dependence of small groups of edges with a shared tail node.
Our bound improves over prior work by replacing the prohibitive linear dependence in the number of nodes in (1) with the typically much smaller value . While on worst-case instances unrestricted diffusions may require steps, understanding the sample complexity in terms of is important: First, IM with explicit step limits [5, 31, 20, 17, 14], is studied for applications where activation time matters. Moreover, due to the “small world” phenomenon [42], in “natural” settings we can expect most activations (even with unrestricted diffusions) to occur within a small number of steps. In the latter case, unrestricted influence values are approximated well by corresponding step-limited influence with .
Our improvement is surprising as generally a linear-in- number of simulations is necessary for estimating influence values of some nodes or to estimate essential model parameters (for example, the edge probabilities in IC models), and this is the case even when is very small. This shows that the maximization problem is in an information sense inherently easier and can circumvent these barriers.
We overview our results and implications – complete proofs can be found in the appendix. We review related work in Section 2 and place it in the context of our results. In Section 3 we formulate quality measures for influence oracles and relate unrestricted and step-limited influence. In particular, we observe that for IM it suffices that the oracle provides good estimates (within a small relative error) of larger influence values. This allows us to circumvent the lower bound for point-wise relative-error estimates shown in Example 1.1.
In Section 4 we state our main technical result that upper bounds by for independent strongly submodular SDMs. This variance upper bound facilitates estimates with small relative error for sets with larger influence values and additive error for sets with small influence values. We also provide a family of IC models that shows that the linear dependence on in the variance bound is necessary. We derive similar variance bounds to mixtures of strongly submodular independent SDMs and -dependence models. All our subsequent sample complexity bounds apply generically to any SDM (submodular/independent or not) that satisfies variance bounds of this form. In Section 5 we review averaging oracles and bound the sample complexity using variance upper bounds. In section 6 we present our median-of-averages oracle that amplifies the confidence guarantees of the averaging oracle and facilitates a tighter sample complexity bound.
In Section 7 we provide a data-adaptive framework that provides guarantees while avoiding the worst-case sample complexity upper bounds on models when a smaller number of simulations suffices.
In Section 8 we consider computational efficiency and present a greedy maximization algorithm based on our median-of-averages oracles that returns a approximate maximizer with probability . The design generically applies to any SDM with a submodular influence function that satisfies the variance bounds.
2 Related work
Our focus here is on influence estimates obtained from averages of i.i.d. simulations of a model. We note that alternative approaches can be more effective for specific families of models. In particular, for IC models, state of the art large-scale greedy approximate maximization algorithms [41, 40, 36, 26] are not based on simulation averages. The estimates are also obtained by building for each node a sample of its "influence set" but instead they use a finer building block of i.i.d. Reverse Reachability (RR) searches. The random RR search method was proposed in [9] to estimate size of reachability sets in graphs and Borg et al [3] adapted it to IC models.
The method can be applied in principle for any live-edge model: A basic RR search is conducted by selecting a node uniformly at random and performing a BFS search on reversed edges that is pruned at length . The search "flips" edges as their head node is reached according to conditional distribution on . The index number of the RR search is then added to the sample set of each node that is "reached" by the search. Influence of a subset can then be unbiasedly estimated from the cardinality of the union of the samples of nodes and the greedy algorithm can be applied to the sets of samples for approximate maximization. To obtain an approximate influence maximizer we need to perform RR searches until some node has a sample of size . In the worst case, this requires RR searches. For general live-edge models, an independent RR search can always be obtained from a simulation by randomly drawing a node and performing a reverse search from it using edges . The same simulation, however, can not generally be reused to generate multiple independent RR searches. This way of obtaining RR searches works for general live-edge models (with arbitrary dependencies) but requires simulations, which does not improve over the generic upper bound (1).
The appeal of the RR searches method is that it can be implemented very efficiently for independent live-edge (including IC or LT) models. The total work performed requires only "edge flips" that can be easily performed using specified edge probabilities for IC models. Moreover, the basic building block of RR searches are local simulations of sets of incoming edges of specified nodes and the full computation requires at most local simulations for each node. When we have full simulations generated by an independent live-edge model these “local” simulations are independent and the required number of "local simulations" can be obtained by decomposing full simulations. But the caveat is that this approach breaks the coherence of simulations, as we construct each RR search from components taken from multiple simulations. These "efficient" implementations (i.e. based on decomposed simulations or edge flips according to marginal probabilities) may "catastrophically fail" when dependencies exist: The influence estimates obtained are biased and cause large errors even when the variance is low. Example 2.1 shows a simple mixture model (of two degenerate IC models) where "efficient" RRS has large error due to bias but averages of few simulations provide accurate estimates. To summarize, with RRS, the implementation that works with full simulations is robust to dependencies but is inefficient and the efficient implementation breaks ungracefully even with light dependencies. Simulations averages Thus we believe that both basic approaches to approximate IM, simulation averages and RRS offer distinct advantages: Simulation averages are robust in that they remain unbiased and are accurate on any SDM, including dependent ones, for which the variance is sufficiently small whereas RRS offers more efficiency with pure independence live-edge models.
3 Preliminaries
We consider stochastic diffusion models as outlined in the introduction. We denote by the -steps reachability set of when we use a specific set of activation functions. We will use the notation (with the parameter omitted) for the random variable obtained when we draw according to the model.
Utility functions
For simplicity, the discussion in the introduction took the utility of a reachable set to be the number of reachable nodes . Generally, we can consider utility functions that are nonnegative monotone non-decreasing with :
[TABLE]
Submodular utility is particularly natural and studied by Mossel and Roch [33]. Additive utility is the special case where nodes have nonnegative weights and
[TABLE]
We consider a diffusion model together with a utility function . The random variable is the utility of the reachable set, that is, when . The influence function is then the expected utility of the reachable set
[TABLE]
We denote the maximum influence value of a subset of cardinality by .
It follows from the definition that for any SDM with utility , the influence is monotone non-decreasing in and in the set and the optimum values are non-decreasing in and . Generally, influence functions of SDMs may not be submodular even when utility is additive. The influence function is submodular for live-edge and for IGT models [33] with submodular utility.
Reduced models
We work with the following notion of model reduction. Let be an independent SDM with submodular utility. For a set of nodes , we define the reduced model of with respect to : The reduced model contains the nodes . The activation function for are obtained by drawing conditioned on and take
[TABLE]
(Note that since we have independent SDM we can separately consider the distribution of activation functions of each node). The utility in is the marginal utility in with respect to :
[TABLE]
The reduced model is also an independent SDM with submodular utility: Activations functions are independent and monotone and the utlity is monotone with and submodular.
Strongly submodular SDM
We say that an independent SDM is strongly submodular if the utility function is submodular and the influence function is submodular with any reduced model and step limit . IC and IGT models are strongly submodular SDMs (see Theorem A.2).
The variance and thus sample complexity upper bounds that we present in the sequel apply to any strongly submodular SDM. We will also provide bounds for some dependent families of models. One family is a slight generalization of IC models that we refer to as -dependence. Here edges are partitioned into disjoint groups, where each group contains at most edges emanating from the same node. The edges in a group must be either all live or none live (are positively dependent).
3.1 Relating step-limited and unrestricted Influence
When unrestricted diffusion from a seed set is such that most activations occur within steps, the unrestricted influence is approximated well by -step influence . We can also relate unrestricted influence with small expected steps-to-activation to step-limited influence: For a seed set , node , and length , we denote by the probability that node is activated in a diffusion from in step . For additive utility functions (4) by definition, . The expected length of an activation path from (in unrestricted diffusion) is:
[TABLE]
The following lemma is an immediate consequence of Markov’s inequality and shows that -stepped influence with approximates well the unrestricted influence:
Lemma 3.1**.**
For all and , .
3.2 Influence Oracles
We say that a set function is an -approximation of another set function at a point if , where . That is, the estimate has a small relative error for sets with and a small absolute error of for sets with . We say that provides a uniform -approximation for all subsets in a collection if is an -approximation for all .
An influence oracle, , is a randomized data structure that is constructed from a set of i.i.d. simulations of a model. The influence oracle, , defines a set function (we use the same name for the set function) that for any input query set , returns a value . For and we say that an oracle provides approximation guarantees with respect to if for any set it is an -approximation with probability at least . That is
[TABLE]
where . Example 1.1 shows that this type of requirement is what we can hope for with an oracle that is constructed from a small number of simulations.
The requirements are for each particular set . If we are interested in stronger guarantees that with probability the approximation uniformly holds for all sets in a collection , we can use an oracle that provides guarantees. The -approximation guarantee for all sets in then follow using a union bound argument: The probability that all sets are approximated correctly is at most .
4 Variance Bounds
We consider upper bounds on the variance of the reachability of a set of nodes that have the following particular form
[TABLE]
for some . The sample complexity bounds we present in the sequel apply to any SDM that satisfies these bounds. In the remaining part of this section we state our variance upper bound for strongly submodular SDMs and extensions and a tight worst-case lower bound for IC models.
4.1 Variance upper bound
The following key theorem facilitates our main results. We show that any strongly submodular SDM satisfy the bound (8) with . The proof is technical and provided in Appendix A.
Theorem 4.1** **(Variance Upper Bound
Lemma).
Let be a strongly submodular SDM. Then for any step limit , and a set of nodes we have
[TABLE]
Some natural dependent SDMs have a variance bound of the form (8): (See Appendix F for proofs.)
** Corollary 4.1****.**
IC models with -steps and -dependence satisfy the bound (8) with . Any mixture of -steps strongly submodular SDMs where each model has probability at least satisfy the bound (8) with .
4.2 Variance lower bound
We provide a family of IC models for which this variance upper bound is asymptotically tight. This shows that the dependence of the variance bound on is necessary.
Theorem 4.2** (Variance Lower Bound).**
For any there is an IC model with a node of maximum influence such that
Our family of models are such that is a complete directed binary tree of depth rooted at with all edges directed away from the root and for all . We show (details in Appendix B) that:
[TABLE]
5 The Averaging Oracle
The averaging oracle uses i.i.d. simulations . For a query it returns the average utility of the reachability set of : We quantify the approximation guarantees of an averaging oracle in terms of a variance bound of the form (8).
Lemma 5.1**.**
Consider an SDM that for some satisfies a variance bound of the form (8). Then for any , an averaging oracle constructed from i.i.d. simulations provides guarantees.
In particular for strongly submodular SDMs, we use the variance bound in Theorem 4.1 and obtain these approximation guarantees using i.i.d. simulations.
Proof.
Using variance properties of the average of i.i.d. random variables, we get that for any query
[TABLE]
The claims follow using Chebyshev’s inequality that states that for any random variable and , . We apply it to the random variable that has expectation and plug in the variance bound. To establish (6) we use and to establish (7) we use . ∎
5.1 Sketched averaging oracle
For live-edge models with additive utility (4), the query efficiency of the averaging oracle can be improved with off-the-shelf use of -step combined reachability sketches [9, 13, 14, 10]. The sketching is according to a sketch-size parameter that also determines the sketches computation time and accuracy of the estimates that sketches provide. A sketch of size is computed for each node so that for any set of nodes , can be efficiently estimated from the sketches of the nodes . The computation of the sketches from an arbitrary set of simulations uses at most edge traversals, where is the maximum in-degree of node over simulations . In the case of an IC model, the expected number of traversals is . Sketching with general node weights can be handled as in [10]. The estimates obtained from the sketches are unbiased with coefficient of variation and are concentrated: Sketches of size provide estimates with relative error with probability .
6 Confidence Amplification: The median-of-averages oracle
The statistical guarantees we provide for our averaging oracle are derived from variance bounds. The limitation is that the number of simulations we need to provide guarantees is linear in and therefore the number of simulations we need to provide uniform guarantees (via a union bound argument) grows linearly with the number of subsets.
In order to find an approximate optimizer, we would like to have a uniform -approximation for all the subsets of size at most but doing so with an averaging oracle would require too many simulations. We adapt to our setting a classic confidence amplification technique [1] to construct an oracle where the number of simulations grows logarithmically in the confidence parameter .
A median-of-averages oracle is specified by a number of pools with simulations in each pool. The oracle is therefore constructed from i.i.d. simulations for and .
The simulations of each pool are used in an averaging oracle that for the th pool () returns the estimates . The median-of-averages oracle returns the median value of these estimates
[TABLE]
We establish that when the i.i.d simulations are from a model that has variance bound (8) for some , the median-of-averages oracle provides approximation guarantees using i.i.d. simulations.
Lemma 6.1**.**
Consider an SDM that for some satisfies the variance bound (8). Then for every and , a median-average oracle organized with pools of simulations in each provides approximation guarantees.
Proof.
An averaging oracle with simulations provides approximation guarantees for . Therefore, the probability of correct estimate for any subset is at least . We now consider the estimates obtained from the pools when sorted in increasing order. The estimates that are not correct (too low or too high) will be at the prefix and suffix of the sorted order. The expected number of correct estimates is at least . The probability that the median estimate is not correct is bounded by the probability that number of correct estimates is , which is . From multiplicative Chernoff bounds, the probability of a sum of Bernoulli random variables beings below is at most . Using we have . ∎
As a corollary, we obtain a sample complexity bound for influence maximization from variance bounds:
Theorem 6.1**.**
Consider an SDM that satisfies the variance bound (8) for some . Then for any and , using i.i.d. simulations we can return such that
[TABLE]
Proof.
We construct a median-of-averages oracle with and where . From Lemma 6.1 using a union bound over the sets we obtain that with probability the oracle provides a uniform -approximation for all subsets of size at most . Let be a set with maximum influence and let be the oracle optimum
[TABLE]
We have
[TABLE]
We comment that the ratio is not tight and we can obtain a bound closer to . This because the particular set to be approximated more tightly by the oracle (that uses enough simulations to support a union bound).
∎
7 Optimization with Adaptive sample size
The bound on the number of simulations we derived in Theorem 6.1 (through a median-of-averages oracle) and also the naive bound (1) (for the averaging oracle) are worst-case. This is obtained by using enough simulations to have the oracle provide a uniform -approximation with probability at least on any problem instance. To obtain the uniform approximation we applied a union bound over subsets that resulted in an increase in the number of required simulations by an factor over the base approximation guarantees.
On real data sets a much smaller number of simulations than this worst-case often suffices. We are interested in algorithms that adapt to such data and return a seed set of approximate maximum influence using a respectively smaller number of simulations and while providing statistical guarantees on the quality of the end result. To do so, we apply an adaptive optimization framework [11] (some example applications are [13, 36, 15, 12]). This framework consists of a “wrapper” that take as inputs oracle constructions from simulations and a base algorithm that performs an optimization over an oracle. The wrapper invokes the algorithm on oracles constructed using an increasing number of simulations until a validation condition on the quality of the result is met. The details are provided in Appendix E. We denote by the number of simulations that provides guarantees and we obtain the following results:
Theorem 7.1**.**
Suppose that on our data the averaging (respectively, median-of-averages) oracle has the property that with simulations, with probability at least , the oracle optimum satisfies
[TABLE]
Then with probability at least , when using simulations with the median-of-averages oracle and simulations with the averaging oracle, the wrapper outputs a set such that .
The wrapper can also be used with a base algorithm that is an approximation algorithm. For live-edge models, our averaging oracle is monotone and submodular and hence we can apply greedy to efficiently compute a set with approximation ratio at least (with respect to the oracle). If we use greedy as our base algorithm we obtain the following:
Theorem 7.2**.**
If the averaging oracle is submodular and has the property that with simulations, with probability at least , it provides a uniform -approximation for all subsets of size at most , then with simulations we can find in polynomial time a approximate solution with confidence .
8 Approximate Greedy Maximization
In this section we consider the computational efficiency of maximization over our oracle that approximates a monotone submodular influence function . The maximization problem is computationally hard: The brute force method evaluates on all subsets of size in order to find the oracle maximizer. An efficient algorithm for approximate maximization of a monotone submodular function is greedy that sequentially builds a seed set by adding a node with maximum marginal contribution at each step. To implement greedy we only need to evaluate at each step the function on a linear number of subsets for and thus overall we do evaluations of on subsets. With a monotone and submodular , for any the subset that consists of the first nodes in a greedy sequence satisfies [35]:
[TABLE]
If our functions provides a uniform -approximation of another function for all subsets of size at most , then (See the proof of 6.1).
The averaging oracle is monotone and submodular [29] when reachability functions are as in live-edge models. Unfortunately our median-of-averages oracle which facilitates tighter bounds on the number of simulations is monotone but may not be submodular even for models where the averaging oracle is submodular. Generally when this is the case, greedy may fail (as highlighted in recent work by Balkanski et al [2]).
Fortunately, greedy is effective on a function that is monotone but not necessarily submodular as long as "closely approximates" a monotone submodular in that marginal contributions of the form
[TABLE]
are approximated well by [13]. We apply this to establish the following lemma:
Lemma 8.1**.**
*The greedy algorithm applied to a function that is monotone and provides a uniform -approximation of a monotone submodular function where returns a set such that . *
Our proof of Lemma 8.1 generally applies to an approximate oracle of any monotone submodular function and is presented in Appendix C. For approximate IM we obtain the following as a corollary:
Theorem 8.1**.**
Consider a submodular SDM that for some satisfies the variance bound (8). Consider a median-of-averages oracle constructed with simulations of arranged as pools with simulations each. Then with probability , the set that contains the first nodes returned by greedy on the oracle satisfies .
Proof.
From Lemma 6.1, with appropriate constants, this configuration provides us with approximation guarantees. From Lemma 8.1 greedy provides the stated approximation ratio. ∎
Greedy on the median-of-averages oracle can be implemented generically for any SDM by explicitly maintaining the reachability sets for all nodes in each simulation as the greedy selects nodes into the seed set . For each step, we compute the oracle value (see (9)) and select for which the value for is maximized:
[TABLE]
We obtain approximation guarantees, however, only when the conditions of monotone submodular influence function and variance bounds are satisfied. For specific families of models, we can consider tailored efficient implementations that incrementally maintain reachability sets and values.
For live-edge models with additive utility (4) we consider an implementation of greedy on a median-of-averages oracle. This can be done by explicit maintenance of reachability sets or by using sketches [9, 13, 14, 10] (see Section 5.1). We obtain the following bounds (proof is deferred to Appendix Section D)
Theorem 8.2**.**
Let be a live-edge model with an additive utility function (4) that satifies the variance bound (8). Then greedy on median-of-averages oracle can be implemented with explicit reachability sets in time
[TABLE]
where is the average number of edges per simulation (For an IC model, and ). When using sketches, the time bound is
[TABLE]
where . For an IC model, and in expectation.
Conclusion
We explore the "sample complexity" of IM on stochastic diffusion models and show that an approximate maximizer (within a small relative error) can be recovered from a small number of simulations as long as the variance is appropriately bounded. We establish the variance bound for the large class of strongly submodular stochastic diffusion models. This includes IC models (where edges are drawn independently) and IGT models (where node thresholds are drawn independently) and natural extensions that allow for some dependencies. Our sample complexity bound significantly improves over the previous bounds by replacing the linear dependence in the number of nodes by a logarithmic dependence on the number of nodes and linear dependence on the length of the activation paths (which are usually very short). An interesting question for future work is to address the gap between the sample complexity and the larger number of simulations currently needed for greedy maximization.
Acknowledgements
This research is partially supported by the Israel Science Foundation (Grant No. 1841/14).
Appendix A Variance upper bound: Proof of Theorem 4.1
In this section we prove Theorem 4.1 which upper bounds the variance in strongly submodular SDMs. We start by bounding the variance in a more basic setting of a submodular function over a random subset in Section A.1 (Theorem A.1). This will be an ingredient in our main proof provided in Section A.3.
We will be using the following basic tools:
Lemma A.1**.**
If are two random variables on the same probability space and the variance of is finite, then:
[TABLE]
[TABLE]
where is a random variable that gets the expectation of conditioned the value of and is a random variable that gets the variance of conditioned the value of .
When is a Bernoulli random variable then Lemma A.1 gives that
[TABLE]
where , , , and .
A.1 Submodular monotone functions on random subsets
Let be a set with elements and let be a set of probabilities such that is associated with the element . Let be a random subset of that contains with probability independently for each . That is
[TABLE]
We say that is a random subset of using probabilities .
A submodular monotone function over is a function with the following properties:
2. 2.
For every with and for every we have that 3. 3.
For any singelton we write instead of . Let . Our purpose in this subsection is to establish the following:
Theorem A.1**.**
Let be a random subset of using probabilities and let be a submodular monotone function. Then
[TABLE]
We give the following additional definitions and lemmas before proving this theorem.
Let and let . We define to be a random subset of using the probabilities . Let be submodular functions over defined by and . Let
[TABLE]
and
[TABLE]
By our definitions and from total expectation (Lemma A.1), .
Lemma A.2**.**
let be a submodular monotone function over and a random subset of using probabilities . Then,
[TABLE]
Proof.
Since is obtained by drawing the elements in independently it follows that
[TABLE]
[TABLE]
∎
Lemma A.3**.**
for any submodular monotone function over and for any index we have that and .
Proof.
The first inequality follows immediately from our definition since
[TABLE]
For the second inequality we use submodularity as follows
[TABLE]
∎
We are now ready for the proof of Theorem A.1.
Proof.
(of Theorem A.1) The proof is by induction on the size of .
Base case: Let we have that
[TABLE]
and
[TABLE]
It is left to prove that \textsc{E}\left[f(X)-f(\emptyset)\right]\geq p_{1}(1-p_{1})\big{[}f(a_{1})-f(\emptyset)\big{]}, and indeed we have that
[TABLE]
Inductive Step: Assume the lemma holds for sets of size and any submodular function and probabilities . For a set with elements and a submodular function over . Let be an arbitrary index.
From the total variance formula in Lemma A.1 we know that
[TABLE]
where , , , and .
By applying the induction hypothesis to with probabilities and and and we get that
[TABLE]
and
[TABLE]
Substituting these bounds in Equation (12) we get that
[TABLE]
∎
A.2 Properties of reduced diffusion models
We establish some properties of reduced independent SDMs that are needed for our upper bound.
We first show that influence values of nodes in a reduced model can only be lower than respective values in the original model:
Lemma A.4**.**
Let be a reduction of a model .
[TABLE]
Proof.
Note that is obtained from by removing nodes. Therefore respective reachability sets given are such that those in can only be subsets of those in :
[TABLE]
Then from monotonicity and submodularity of we get
[TABLE]
(second inequality follows from monotonicity and submodularity of so that for all , .) Therefore,
[TABLE]
∎
A convenient property is that reduction preserves strong monotone submodularity:
Lemma A.5**.**
A reduction of a strongly monotone submodular model is also strongly monotone submodular.
Proof.
A reduced model with respsect to of a reduced model of with respect to is a reduced model of with respect to . Also note that the reduced utility function is also monotone and submodular. ∎
We next show that IC or IGT models with submodular utility are closed under reduction:
Theorem A.2**.**
IC and IGT models with submodular utility are strongly submodular SDMs.
Proof.
We first show that IC/IGT models are independent SDMs. In the introduction we expressed IC and IGT models as SDMs: A live-edge model is expressed as an SDM using if and only if there is an edge from a node in to . The model is independent if for all the edges incoming to are independent of all other edges. In IC models all edges are independent and hence IC models are independent SDMs. Recall (from the Introduction) that an IGT model is expressed as an SDM using . In an IGT model the thresholds are independent random variables, and hence are independent. Hence, an IGT model is an independent SDM. Submodularity of influence when utlity is submodular is established for IC models in [29] and for IGT models in [33].
Reduction of any model preserves submodularity of the utility and in particular this holds for reduced IC/IGT models. What remains to show is that a reduced IC/IGT model is also an IC/IGT model (respectively). This would conclude the proof of strong submodularity since any IC and IGT models with submodular utility has a submodular influence functions.
To establish this remaining claim we consider IC/IGT models and express the reduction in terms of the activation functions as one in terms of the respective family of models.
We first consider IC models. The reduced IC model is obtained from by deleting the nodes and their incident edges and keeping on remaining edges. This is clearly an IC model. It remains to show that this is equivalent to the reduction of the distribution of activation functions. The conditioning that is equivalent to live-edge set with no edges from to . For such edge set for any we have which corresponds to having at least one edge from to . From independence of edges, the conditional distribution is also independent and retains the same inclusion probabilities.
We next consider IGT models. The reduction in terms of activation functions distribution is equivalent to functions is equivalent to modifying the functions so that
[TABLE]
The reduced model is clearly an IGT model. The conditioning that means that . Therefore, the conditional distribution of provided it was not activated in the first step is uniform on . The probability that given this conditioning is equal to the probability that . ∎
A.3 Upper bound on the variance in strongly
submodular SDM
Let be a -stepped diffusion model. We denote by the maximum influence of a single node in that is not included in :
[TABLE]
As before, we omit if it can be understood from the context. We prove the following theorem which is a restatement of Theorem 4.1.
Theorem A.3**.**
Let be a strongly submodular SDM. Then for any and a set of nodes :
[TABLE]
The remaining part of this Subsection contains the proof of the Theorem.
Let be a set of nodes, and let
[TABLE]
be the nodes that have nonzero probability to be activated if is active. For the special case of IC models, is the set of outgoing neighbors of .
We first consider the case where is empty. In this case, for all . Therefore, , , and and the claim holds.
We now assume that is not empty and give a proof by induction on .
A.3.1 Base case ()
Let
[TABLE]
be the probability that node is activated in step provided that the set of nodes was active at step [math]. From independence of the model, the events of activating different nodes at step 1 are independent. We have that the set of nodes that is active at step 1 is a random subset of with probabilities as defined in Subsection A.1. Moreover, from monotonicity and submodularity of , the function is monotone and submodular with . We can therefore apply Theorem A.1 to bound the variance of :
[TABLE]
We now note that
[TABLE]
and
[TABLE]
For all we have . Therefore
[TABLE]
Substituting in (14) we obtain the claim
[TABLE]
A.3.2 Inductive step
We define to be the random variable that is the -steps reachability of in a diffusion on seeded with that is conditioned on the event that exactly the nodes (and no other nodes) are activated in step 1. Equivalently, we condition on such that for , and for , . We respectively define to be the random variable . From definition, we have
[TABLE]
We consider the reduced model of with respect to and show that the conditioned steps diffusion from in is equivalent to the unconditioned steps diffusion from in :
Lemma A.6**.**
For any and , the random variables and have identical distribution overs subsets. The random variables and have identical distributions over values.
Proof.
We first consider . For a draw of conditioned activation functions we have . By definition, we also have and the claim holds.
We next consider . We first observe that in both situations, (i) the reduced model when seeded with and (ii) the conditioned diffusion in seeded with such that the nodes are activated in the first step, the progression is determined only by the activation functions on the nodes .
We next argue that the distribution of activation functions projected on the nodes is the same in both situations. From independence of it suffices to consider separately the activation functions of each node. From definition of a reduced model, we draw for each , conditioned on . This is exactly what we get for the conditioned diffusion in .
We can thus match the supports (sets of activations functions) in both situations so that and are matched when the projections on is the same. The starting points are at steps [math] of the reduced model and step of the conditioned process is , the progression of new activations is thus the same. Therefore, for any step ,
[TABLE]
and the first claim follows.
For the second claim, note that and thus
[TABLE]
where the equalities are those of distributions. ∎
As immediate corollaries we can relate expectations and variance of as follows:
[TABLE]
[TABLE]
Total Variance:
We define the random variable to be the subset of which is activated after the first step. Note that is a random subset of using probabilities for as defined in Section A.1. By the total variance formula we get that
[TABLE]
We bound the total variance by separately bounding the two terms.
Bound on the first term of the total variance:
We consider the reduced model with respect to and a restriction of the influence function to the domain that is subsets :
[TABLE]
From Lemma A.6, this function represents the expected marginal utility value of nodes which are not in that are activated after steps if we activate at step [math] and the set at step .
We first observe that is monotone and submodular. This because strong monotone submodularity of our model implies that the reduced model is also strongly monotone and submodular, and a restriction of a monotone and submodular function is also monotone and submodular. We establish two helpful properties of . First,
[TABLE]
which holds for any influence function. Second, using Lemma A.4 we obtain
[TABLE]
We are now ready to bound the first term of the total variance (18). Our monotone submodular function and the random subset using probabilities satisfy the conditions of Theorem A.1.
[TABLE]
Bound on the second term of the total variance:
We next bound the second term of (18) which is the expectation of the variance conditioned on :
[TABLE]
Where we take
[TABLE]
to be the subset that maximizes the ratio.
Using the induction hypothesis on -stepped influence we get
[TABLE]
We now relate the maximum influence of nodes in the original and reduced models:
[TABLE]
From (22) using (23) and (24) we obtain
[TABLE]
Combining the bounds of the first and second terms
The claim of the Theorem follows using total variance (18) and the bounds on the first term (21) and second term (25).
Appendix B Variance lower bound construction
Lemma B.1**.**
Let be complete binary tree where each edge has probability and let be the height of the node . Then, and .
Proof.
By induction on the height of the of the node.
base step: (): It is clear that and since is a leaf.
Inductive step: has two neighbors and each is reached with probability . Let and be the neighbors of and let be random variables that indicate if were activated respectively. The variables are Bernoulli random variables with , hence, and . Since the graph is a tree, the reachabilities of and are independent random variables, so we can simply write:
[TABLE]
The variable and are identical and and are independent random variables, Thus,
[TABLE]
[TABLE]
The computation of the variance is similar:
[TABLE]
For two independent random variables holds that: , we have that:
[TABLE]
[TABLE]
[TABLE]
∎
Theorem B.1**.**
There is a model and a set of nodes such that .
Proof.
Lemma B.1 shows that for every node , and . It follows that the root has the largest influence and , Furthermore since the nodes of the largest influence in are the children of . We conclude that:
[TABLE]
∎
Appendix C Greedy optimization with approximate non-submodular oracle
In this section we present the proof of Lemma 8.1. We show that our approximation guarantees imply that the application of greedy on generates a sequence that is an approximate greedy sequence (in the sense of Lemma C.1) with respect to .
We first state a helpful Lemma [13] that establishes that it suffices that to approximate the marginal contributions
[TABLE]
.
Lemma C.1**.**
[13]** Given a monotone submodular function , an approximate greedy algorithm that for some selects at each step an element such that has approximation ratio .
Proof.
It is easy to see that the approximation ratio of -approximate greedy is . It therefore suffices to establish that this expression is larger than for Equivalently, we need to show that for all and
[TABLE]
This follows from equality holding for and and the function being concave up (second derivative is positive). ∎
Proof of Lemma 8.1.
Consider a monotone non-negative that is a uniform -approximation of a monotone non-negative with . By definition of -approximation (see Section 3.2), for all with . Therefore,
[TABLE]
Inequality (26) follows immediately when because the relative error is at most . For we have absolute error being at most which is a relative error of at most . Inequality (27) follows from the absolute error being at most and .
We establish that these conditions imply that greedy on on the prefix of the greedy sequence where is actually approximate greedy (as in the conditions of Lemma C.1) with respect to . Note that for and thus the prefix restriction does not limit generality. The claim will then follow from Lemma C.1.
For , it follows from Equations (26) and (27) , that the first element of a greedy sequence with respect to , , satisfies . Therefore from the second iteration and on, we have a set for which the relative error bounds in Equation (26)) applies.
We consider the marginal contributions for any node . We have
[TABLE]
We use these inequalities to bound the absolute error of (any) marginal influence estimate by
[TABLE]
We now consider the node with maximum marginal contribution to with respect to and its contribution value
[TABLE]
Thus, when ,
[TABLE]
By applying (C) to we get that
[TABLE]
Therefore the node with maximum marginal contribution according to satisfies
[TABLE]
By using (C) again, substituting (29), and using that fact that :
[TABLE]
Therefore, the greedy sequence according to is an approximate greedy sequence according to and satisfies the conditions of Lemma C.1. Therefore the resulting sequence yields an approximation ratio at least . ∎
Appendix D Greedy for live-edge models
Proof of Theorem 8.2:
Proof.
For the first bound, we explicitly maintain for each node , for each pool, the reachability set of in the simulations of the pool (and its cardinality). The dominant term in the cost of computation is performing a BFS from each node in each of the simulations that is truncated at distance . The total computation time is
[TABLE]
where
[TABLE]
is the average number of edges per simulation. For an IC model, . When a node is selected into the seed set we remove all nodes in its reachability set from the reachability sets of all other nodes. The removal cost can be "charged" to the initial reachability computation.
The dependence of the computation time on the graph size can be improved by using combined reachability sketches [9, 13, 14, 10] instead of maintaining the reachability sets explicitly (see Section 5.1). The sketch size needed in order to provide the required accuracy of (as in Theorem 8.1) uniformly for all subsets of size at most is . We compute a sketch for each node in each of the pools, so in total we have node sketches. The construction time of these sketches has a term linear in the total size of simulations and a term for sketch constructions which is a product of the number of pools and the construction time for each pool. The per-pool construction time is as described in Section 5.1 and is bounded by (sketch size) visits for each node, each involving reverse traversals of incoming edges of the node in some simulation. The per-pool construction time for an IC model is in expectation. The time with arbitrary simulations for pool is . In total over all pools, the construction time is dominated by , where for arbitrary simulations and for simulations generated by an IC model.
The sketches improve the computation time of greedy. Each iteration of greedy uses the (precomputed) union sketch of the current seed set in each pool. It then examines the sketches of each to compute the estimate of the averaging oracle in each pool. This operations takes in total for the iteration. Therefore iterations of greedy maximization takes using the sketches. Combining the construction cost of the sketches using and and the greedy implementation over the sketches we obtain a total bound on the computation time of
[TABLE]
∎
Appendix E Optimization with adaptive sample size
The pseudocode for our wrapper is provided in Algorithm 1. The inputs to the wrapper are a base algorithm and two constructions of oracles from sets of simulations. The first construction produces an oracle, , that we use for validation. The second construction produces oracles, , that are provided as input to to perform the optimization. The oracles provide an approximation of our influence function with non-uniform guarantees. For specified we use the expressions or for the number of simulations required to obtain guarantees (in the sense of Section 3.2). This gives us a relation between , , and a number of simulations. When constructing an oracle with a given number of simulations and a specified , we can determine the confidence we have from and . The oracles that we consider have the property that for a fixed , decreases at least linearly with the number of simulations. (i.e., when we double the number of simulations decreases by at least a factor of .)
The wrapper first determines an upper bound () on the maximum number of iterations it performs (based on the initial number and the simulation budget) and constructs a validation oracle that provides guarantees for a small number of sets (queries) which equals this maximum number of iterations. It then starts with a set of simulations that suffice for the oracle to provide (non-uniform) approximation guarantees. The wrapper repeats the following: It constructs an “optimization” oracle using the set of simulations and applies over to obtain a set . The wrapper terminates when is close to or when our simulation budget of is exceeded. Otherwise, we double the number of simulations in our set and repeat.
The wrapped algorithm can be an exact or approximate optimizer. It is applied to the oracle function and therefore its quality guarantees are with respect to how well the oracle value of the output set approximates the oracle optimum . The wrapper extends the approximation guarantees that provides (with respect to the oracle) to a guarantee with respect to the influence function while avoiding the worst-case number of simulations needed for a uniform approximation.
We first establish some basic properties.
Lemma E.1**.**
Let be a set with maximum influence (with ). With probability at least , all the optimization oracles constructed by the wrapper have .
Proof.
The probability that fails for the first oracle is at most . The number of uses simulations doubles in each iteration and all our constructions are such that the confidence parameter decreases at least linearly with the number of simulations. We therefore obtain that the sequence of failure probabilities for is geometric and sums up to at most . ∎
As an immediate corollary we obtain:
** Corollary E.2****.**
Under the conditions of Lemma E.1, the oracle optimum in all iterations satisfies
[TABLE]
The following is immediate from the construction of the validation oracle.
Lemma E.3**.**
With probability at least , the validation oracle has relative error at most on all tests in which the input set is such that and absolute error at most otherwise.
Proof.
The wrapper performs at most iterations before it stops, in each iteration the validation oracle fails to provide an -approximation with probability at most . Therefore, by union bound, the probability that the algorithm fails to provide an -approximation in at least one round is at most . ∎
Lemma E.4**.**
Assume that our data and our optimization oracle with or more simulations, are such that with probability at least , the optimum of the oracle is an approximate optimizer, that is:
[TABLE]
[TABLE]
and assume that the algorithm returns the oracle optimum. Then with probability at least , the wrapper terminates after at most simulations and returns such that .
Proof.
First we show that the wrapper returns a set with the required properties with probability at most and then we show that the number of iterations the wrapper does before it stops is smaller than with probability of at most .
From Lemma E.1, with probability at least in all iterations returns for which . The validation succeeds only if . From Lemma E.3 with probability at least in all iterations we have
[TABLE]
Therefore, with probability the set returned by the wrapper satisfies
[TABLE]
If then . Otherwise, we have that
We have to show that with probability at least within simulations the wrapper returns such a set to finish the proof. Consider the first iteration where . By Equations (33) and (34) with probability at least we have that and .By Lemma E.3 we have that with probability at least , the validation oracle satisfies that or . By the last two statements we have that with probability of at least :
[TABLE]
or
[TABLE]
∎
Theorem 7.1, which we restate below to provide reading fluency, now follows as a corollary.
Theorem E.1** (Theorem 7.1).**
Suppose that on our data the averaging (respectively, median-of-averages) oracle has the property that with simulations, with probability at least , the oracle optimum satisfies
[TABLE]
Then with probability at least , when using simulations with the median-of-averages oracle and simulations with the averaging oracle, the wrapper outputs a set such that .
Proof of Theorem 7.1.
We analyze here the number of simulations required using the averaging oracles and the median-of-averages oracles, in both cases we use median-of-averages oracles for validation. In both cases , where . By Lemma E.4 the number of simulations is at most . and get different values for each oracle.
median-of-averages oracles analysis
We have that by Lemma 6.1 and we set by Theorem 6.1. Simple calculation shows that:
[TABLE]
Therefore,
[TABLE]
averaging oracles analysis
We have and we set according to the respective worst-case guarantees on the number of simulations specified in (1). A simple calculations shows:
[TABLE]
Therefore,
[TABLE]
∎
We next consider cases where the algorithm is approximate (may not return the oracle optimizer). We assume in these cases that the optimization oracles when constructed with a given number of simulations provide, with high probability, uniform -approximation for all subsets of cardinality at most :
[TABLE]
We first show that a very weak assumption on suffices to guarantee termination with good probability.
Lemma E.5**.**
If the optimization oracle when constructed with or more simulations provides uniform -approximation with probability at least , and the algorithm returns such that . Then with probability at least the wrapper will terminate after using at most simulations.
Proof.
Consider the first iteration where is constructed using at least simulations. Let be the set that returns at this iteration. Since provides uniform -approximation we have that with probability at least . Combining this with our assumption we get that and by Lemma E.3 we have that with probability at least if then and if then . Combining we obtain that , and thus the validation condition holds. ∎
We next consider algorithms that guarantees some approximation ratio .
Theorem E.2**.**
Suppose that our optimization oracle when constructed with or more simulations provides uniform -approximation with probability at least . Assume now that the algorithm returns a set such that
[TABLE]
Then, the set returned by our wrapper satisfies with probability of at least .
Proof.
Consider an optimal set (with ). By Lemma E.1 with probability at least all our oracles have are within . By the assumption, the sets returned by in all iterations have . When the wrapper stops we have that and by Lemma E.3 we have with probability at least that .
Combining, we have that with probability at least ,
[TABLE]
Now, a simple calculation shows that .
∎
We can prove now Theorem 7.2 (restated for reading fluency):
Theorem E.3** (Theorem 7.2).**
If the averaging oracle has the property that with simulations, with probability at least , it provides a uniform -approximation for all subsets of size at most , then with simulations we can find in polynomial time a approximate solution with confidence .
Proof.
The averaging oracle is monotone and submodular [29] and therefore greedy can efficiently recover a set such that .
By Lemma E.5, the wrapper terminates using at most with probably at least . Applying Theorem E.2 with , we get that with probability at least . Hence, with probability at least the wrapper applied with greedy finds -approximate solution using simulations. ∎
Appendix F Variance bounds for dependent models
In this section we provide a proof for Corollary 4.1. We consider a natural extensions of IC models, -dependence, that allow for some dependencies between edges and mixtures of IC and IGT models. For these extensions, we establish upper bounds of the form (8) on the variance of the reachability of a set of nodes.
We bound the variance by constructing for each dependent model a corresponding IC model and then apply the variance upper bound established in Section A for IC models.
For mixture models we provide a generic derivation that bounds the variance of the mixture by variance of components.
F.1 -dependence models
The first family we consider are -dependence models, which we define as follows. We assume that all edges with the same tail node are partitioned into disjoint groups where each group is of size at most . The edges of each group are either all active together with probability or none is active with probability . The special case where all groups are of size corresponds to an IC model (where all edges are independent).
Theorem F.1**.**
Let be a -dependence model for some . For every set we have that:
[TABLE]
Proof.
We construct an IC model from the given -dependence model . The model is defined over the set of nodes of together with an additional set of dummy nodes. The construction has the properties that -step influence in from a set of nodes is equal to -stepped influence of in . Furthermore, the variances of the sizes of the -step reachbility of in is the same as the variance of the step reachability of in . The influence of each dummy node in is at most . The claim follows from these properties and Theorem 4.1.
Here is a formal description of our reduction.
We start by putting in the set of the nodes of . Then for every group in we do the following:
Add a new dummy node to , and add to the edge and give it the probability . We assign weight [math] to so that it does not contribute to the reachability of any set of nodes. 2. 2.
we create edges for every , each such edge has probability 1.
Let be a set of nodes in . It follows from our construction that for any set of nodes the probability that is the same as the probability that . This implies that for any
[TABLE]
and
[TABLE]
Each dummy node is connected to at most original nodes, hence, is bounded by . By Theorem A.3 it follows that for every set of nodes in :
[TABLE]
Combining all these observations together, we get that
[TABLE]
∎
This Theorem can be generalized to more complex dependencies. For example it holds for any distribution on subsets of the outgoing edges from each node that we can realize by a distribution on disjoint subsets where we draw each subset with certain probability, and take the union of the subset which we draw.
F.2 Mixture of IC and IGT models
The second family of dependent models we consider is a mixture of IC and IGT models.
Consider a set of models for and respective probabilities such that . We define a mixture model as follows. To draw , we first draw according to probabilities and then return .
We provide two proofs for the variance bound of the mixture. The first is direct and applies to any mixture of models that satisfies the variance bound of Theorem 4.1), and in particular to mixtures of strongly submodular SDMs. The second proof is specific to live-edge models and based on a reduction to an IC model.
Theorem F.2**.**
Consider a model that is a mixture of models with probabilities that satisfy the variance bound of Theorem 4.1. Then for all ,
[TABLE]
Proof.
We first relate the influence of in the mixture model to the influence of in the components.
[TABLE]
This holds to any set and any . Therefore we also obtain the inequality
[TABLE]
It also follows that we can bound the influence values on the component by the respective ones in the mixture: and thus
[TABLE]
The random variable can be expressed as a sum of of products of random variables:
[TABLE]
where are Bernoulli with probabilities . The random variables are independent of each other and also are independent from (the joint distribution of) . The variables have negative dependence as and thus the products are also negatively dependent and thus
[TABLE]
We will instead bound the variance of a surrogate random variable
[TABLE]
that has the same sum of products but with the variables being independent of each other and hence the products are also independent. We have
[TABLE]
We next express the variance of each product using variance properties of the product of two independent random variables. For :
[TABLE]
Therefore, invoking Theorem 4.1 to bound the variance for each IC model and then using (37) and (38) and finally using (40) and (44) we get
[TABLE]
∎
We next give a different proof (of a slightly different bound) for live-edge models using a reduction to an IC model. Consider a set of -steps models for and respective probabilities such that . We define a mixture model as follows. To draw , we first draw according to probabilities and then return .
Theorem F.3**.**
Consider a model that is a mixture of IC models with probabilities . Then for all ,
[TABLE]
Proof.
We first argue that we can assume without loss of generality that is a single node and does not contain edges that are incoming to . We can transform a general case and to this form by contracting all nodes in into a single node and deleting all edges that are incoming to . We then retain the same conditional distribution on the remaining edges. Note that this transformation preserves the distribution of and hence also its expectation and variance. The influence values of nodes can only decrease. Finally, the transformed model is also a mixture of correspondingly transformed IC models, where in each such model the distribution of remains the same and influence values can only decrease. It follows that the claimed variance bound for the transformed model implies the same bound for the original model.
We construct a new IC model with respect to (a single node) as follows. The new model has nodes , where each is a map of . We create an instantiation of each of our IC models with set of nodes and edges with the probabilities as in the model . The new IC model has a root node with weight [math] and for each , there is an edge with probability , where is the image of in the copy of . We can see that
[TABLE]
that is, the steps influence of in the constructed IC model is equal to the steps influence of in the mixture model .
We next consider the variance of the random variables and . Both these random variables are a sum of products of random variables:
[TABLE]
where are Bernoulli with probabilities . In both cases the random variables are independent of each other and also are independent from (the joint distribution of) . But in the case of the random variables are independent and hence also the products are independent and in the case of , the variables have negative dependence as and thus the products are also negatively dependent. Therefore,
[TABLE]
Finally, we bound by considering the maximum influence of a node other than in the constructed model . For we have
[TABLE]
where the last inequality follows from (40). We next consider nodes that is a map of a node .
[TABLE]
The last inequality follows because for any node we have . Combining (42) and (43) we get
[TABLE]
To conclude, we invoke Theorem 4.1 for the IC model :
[TABLE]
We then apply inequalities (44) and the equality (40) to obtain the claim. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. System Sci. , 58:137–147, 1999.
- 2[2] E. Balkanski, A. Rubinstein, and Y. Singer. The limitations of optimization from samples. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017 , 2017.
- 3[3] C. Borg, M. Brautbar, J. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA , 2014.
- 4[4] W. Chen, L. V. S. Lakshmanan, and C. Castillo. Information and Influence Propagation in Social Networks . Morgan & Claypool, 2013.
- 5[5] W. Chen, W. Lu, and Y. Zhang. Time-critical influence maximization in social networks with time-delayed diffusion process. In AAAI , 2012.
- 6[6] W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD . ACM, 2010.
- 7[7] W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD . ACM, 2009.
- 8[8] Wei Chen, Tian Lin, Zihan Tan, Mingfei Zhao, and Xuren Zhou. Robust influence maximization. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’16. ACM, 2016.
