Sample Complexity Bounds for Influence Maximization

Gal Sadeh; Edith Cohen; Haim Kaplan

arXiv:1907.13301·cs.LG·October 30, 2019

Sample Complexity Bounds for Influence Maximization

Gal Sadeh, Edith Cohen, Haim Kaplan

PDF

TL;DR

This paper establishes new, tighter bounds on the number of simulations needed to effectively identify influential nodes in networks under stochastic diffusion models, improving efficiency for influence maximization tasks.

Contribution

It introduces a novel upper bound on sample complexity for influence maximization, applicable to a broad class of models including IC and LT, and offers a data-adaptive method for fewer simulations.

Findings

01

Significantly improves sample complexity bounds for influence maximization.

02

Provides a data-adaptive method reducing the number of required simulations.

03

Develops an efficient greedy algorithm for approximate maximization.

Abstract

Influence maximization (IM) is the problem of finding for a given $s \geq 1$ a set $S$ of $∣ S ∣ = s$ nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set $S$ of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a $(1 - ϵ)$ -approximate maximizer with confidence $1 - δ$ . Our main result is a surprising upper bound of $O (s τ ϵ^{- 2} ln \frac{n}{δ})$ for a broad class of models that…

Equations278

ϕ_{v} : 2^{V ∖ {v}} \to {0, 1} .

ϕ_{v} : 2^{V ∖ {v}} \to {0, 1} .

Reach^{t + 1} (ϕ, S) := {v \in V ∣ ϕ_{v} (Reach^{t} (ϕ, S)) = 1} .

Reach^{t + 1} (ϕ, S) := {v \in V ∣ ϕ_{v} (Reach^{t} (ϕ, S)) = 1} .

I^{τ} (S) := \textsc E [R^{τ} (S)] = \textsc E_{ϕ \sim G} [∣ Reach^{τ} (ϕ, S) ∣] .

I^{τ} (S) := \textsc E [R^{τ} (S)] = \textsc E_{ϕ \sim G} [∣ Reach^{τ} (ϕ, S) ∣] .

ar g S : ∣ S ∣ \leq k max I^{τ} (S) .

ar g S : ∣ S ∣ \leq k max I^{τ} (S) .

ϕ_{v} (T) := Indicator (θ_{v} \leq f_{v} (T)) .

ϕ_{v} (T) := Indicator (θ_{v} \leq f_{v} (T)) .

\hat{A}^{τ} (T) := \frac{1}{ℓ} i = 1 \sum ℓ ∣ Reach^{τ} (ϕ_{i}, T) ∣

\hat{A}^{τ} (T) := \frac{1}{ℓ} i = 1 \sum ℓ ∣ Reach^{τ} (ϕ_{i}, T) ∣

O\left(\epsilon^{-2}sn\log\frac{n}{\delta}\right)\

O\left(\epsilon^{-2}sn\log\frac{n}{\delta}\right)\

O (ϵ^{- 2} s τ lo g \frac{n}{δ})

O (ϵ^{- 2} s τ lo g \frac{n}{δ})

VReach^{τ} (ϕ, T) := H (Reach^{τ} (ϕ, T)) .

VReach^{τ} (ϕ, T) := H (Reach^{τ} (ϕ, T)) .

VReach^{τ} (ϕ, T) := v \in Reach^{τ} (ϕ, T) \sum w (v) .

VReach^{τ} (ϕ, T) := v \in Reach^{τ} (ϕ, T) \sum w (v) .

I^{τ} (T) := \textsc E [R^{τ} (T)] = \textsc E_{ϕ \sim G} VReach^{τ} (ϕ, T) .

I^{τ} (T) := \textsc E [R^{τ} (T)] = \textsc E_{ϕ \sim G} VReach^{τ} (ϕ, T) .

for all S \subset V ∖ (T \cup {v}), ϕ_{v}^{'} (S) := ϕ_{v} (S \cup T)

for all S \subset V ∖ (T \cup {v}), ϕ_{v}^{'} (S) := ϕ_{v} (S \cup T)

for all S \subset V ∖ T, H^{'} (S) := H (S \cup T) - H (T) .

for all S \subset V ∖ T, H^{'} (S) := H (S \cup T) - H (T) .

\overline{D} (S) := \frac{\sum _{v \in V} w ( v ) \sum _{d \leq n} d \cdot p ( S , v , d )}{I ( S )} .

\overline{D} (S) := \frac{\sum _{v \in V} w ( v ) \sum _{d \leq n} d \cdot p ( S , v , d )}{I ( S )} .

\forall T such that I^{τ} (T) \geq \textsc O P T_{1}^{τ},

\forall T such that I^{τ} (T) \geq \textsc O P T_{1}^{τ},

\forall T such that I^{τ} (T) \leq \textsc O P T_{1}^{τ},

Var [R^{τ} (T)] \leq c I^{τ} (T) max {I^{τ} (T), v \in V max I^{τ} (v)}

Var [R^{τ} (T)] \leq c I^{τ} (T) max {I^{τ} (T), v \in V max I^{τ} (v)}

Var [R^{τ} (T)] \leq τ I^{τ} (T) v \in V ∖ T max I^{τ - 1} (v) .

Var [R^{τ} (T)] \leq τ I^{τ} (T) v \in V ∖ T max I^{τ - 1} (v) .

I^{τ} (v)

I^{τ} (v)

Var [R^{τ} (v)]

Var [\hat{A}^{τ} (T)] = \frac{1}{ℓ} Var [R^{τ} (T)] \leq \frac{1}{ℓ} c I^{τ} (T) max {I^{τ} (T), \textsc O P T_{1}^{τ}} .

Var [\hat{A}^{τ} (T)] = \frac{1}{ℓ} Var [R^{τ} (T)] \leq \frac{1}{ℓ} c I^{τ} (T) max {I^{τ} (T), \textsc O P T_{1}^{τ}} .

mA^{τ} (T) := Median_{i \in [r]} \hat{A}_{i}^{τ} (T) = Median_{i \in [r]} Ave_{j \in [ℓ]} VReach^{τ} (ϕ_{ij}, T) .

mA^{τ} (T) := Median_{i \in [r]} \hat{A}_{i}^{τ} (T) = Median_{i \in [r]} Ave_{j \in [ℓ]} VReach^{τ} (ϕ_{ij}, T) .

Pr [I^{τ} (T) \geq (1 - 2 ϵ) \textsc O P T_{s}^{τ}] \geq 1 - δ .

Pr [I^{τ} (T) \geq (1 - 2 ϵ) \textsc O P T_{s}^{τ}] \geq 1 - δ .

T := ar g S ∣ ∣ S ∣ \leq s max mA (S) .

T := ar g S ∣ ∣ S ∣ \leq s max mA (S) .

I (T) \geq (1 - ϵ) mA (T) \geq (1 - ϵ) mA (S) \geq (1 - ϵ)^{2} I (S) \geq (1 - 2 ϵ) \textsc O P T_{s}^{τ} .

I (T) \geq (1 - ϵ) mA (T) \geq (1 - ϵ) mA (S) \geq (1 - ϵ)^{2} I (S) \geq (1 - 2 ϵ) \textsc O P T_{s}^{τ} .

I^{τ} (T)

I^{τ} (T)

\hat{F} (T) \geq (1 - (1 - 1/ s)^{s}) S ∣ ∣ S ∣ \leq s max \hat{F} (S) \geq (1 - 1/ e) \textsc O P T_{s} (\hat{F}) .

\hat{F} (T) \geq (1 - (1 - 1/ s)^{s}) S ∣ ∣ S ∣ \leq s max \hat{F} (S) \geq (1 - 1/ e) \textsc O P T_{s} (\hat{F}) .

F (u ∣ S) := F (S \cup {u}) - F (S)

F (u ∣ S) := F (S \cup {u}) - F (S)

ar g v \in V ∖ S max mA^{τ} ({v} \cup S) .

ar g v \in V ∖ S max mA^{τ} ({v} \cup S) .

O (ϵ^{- 2} s^{3} c ln (\frac{n}{δ}) \overline{m} n),

O (ϵ^{- 2} s^{3} c ln (\frac{n}{δ}) \overline{m} n),

O (ϵ^{- 2} s^{3} ln \frac{n}{δ} (c \overline{m} + s (m^{*} + n s) ln n)),

O (ϵ^{- 2} s^{3} ln \frac{n}{δ} (c \overline{m} + s (m^{*} + n s) ln n)),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Sample Complexity Bounds for Influence Maximization

Gal Sadeh, Tel Aviv University,[email protected]

Edith Cohen, Google Research and Tel Aviv University, [email protected]

Haim Kaplan, Google Research and Tel Aviv University, [email protected]

Abstract

Influence maximization (IM) is the problem of finding for a given $s\geq 1$ a set $S$ of $|S|=s$ nodes in a network with maximum influence. With stochastic diffusion models, the influence of a set $S$ of seed nodes is defined as the expectation of its reachability over simulations, where each simulation specifies a deterministic reachability function. Two well-studied special cases are the Independent Cascade (IC) and the Linear Threshold (LT) models of Kempe, Kleinberg, and Tardos [29]. The influence function in stochastic diffusion is unbiasedly estimated by averaging reachability values over i.i.d. simulations. We study the IM sample complexity: the number of simulations needed to determine a $(1-\epsilon)$ -approximate maximizer with confidence $1-\delta$ . Our main result is a surprising upper bound of $O(s\tau\epsilon^{-2}\ln\frac{n}{\delta})$ for a broad class of models that includes IC and LT models and their mixtures, where $n$ is the number of nodes and $\tau$ is the number of diffusion steps. Generally $\tau\ll n$ , so this significantly improves over the generic upper bound of $O(sn\epsilon^{-2}\ln\frac{n}{\delta})$ . Our sample complexity bounds are derived from novel upper bounds on the variance of the reachability that allow for small relative error for influential sets and additive error when influence is small. Moreover, we provide a data-adaptive method that can detect and utilize fewer simulations on models where it suffices. Finally, we provide an efficient greedy design that computes an $(1-1/e-\epsilon)$ -approximate maximizer from simulations and applies to any submodular stochastic diffusion model that satisfies the variance bounds.

1 Introduction

Models for the spread of information among networked entities were studied for decades in sociology and economics [24, 18, 27]. A diffusion process is initiated from a seed set of nodes (entities) and progresses in steps: Initially, only the seed nodes are activated. In each step additional nodes may become active based on the current set of active nodes. The progression can be deterministic or stochastic. The $t$ -stepped influence of a seed set $S$ of nodes is then defined as its expected reachability (total number of active nodes) in $t$ steps.

Influence maximization (IM) is the problem of finding a set $S$ of nodes of specified cardinality $|S|=s$ and maximum influence. The IM problem was formulated nearly two decades ago by Richardson and Domingos [16, 38] and inspired by the application of viral marketing. In a seminal paper, Kempe, Klienberg, and Tardos [29] studied stochastic diffusion models and introduced two elegant special cases, the Independent Cascade (IC) and Generalized Threshold (GT) diffusion models. Their work sparked extensive followup research and large scale implementations [33, 7, 28, 37]. Currently IM is applied in multiple domains with linked entities for tasks as varied as diversity-maximization (the most representative subset of the population) and sensor placement that maximize coverage [30, 4, 32].

We consider stochastic diffusion models (SDM) $\mathcal{G}(V)$ over $|V|=n$ nodes that are specified by a distribution $\boldsymbol{\phi}\sim\mathcal{G}$ over sets $\boldsymbol{\phi}:=\{\phi_{v}\}_{v\in V}$ of monotone non-decreasing boolean activation functions

[TABLE]

A diffusion process starts with a seed set $S\subset V$ of nodes and $\boldsymbol{\phi}\sim\mathcal{G}$ . At step [math] we activate the seed nodes $\texttt{Reach}^{0}(\boldsymbol{\phi},S):=S$ . The diffusion then proceeds deterministically: At step $t>0$ all active nodes remain active and we activate any inactive node $v$ where $\phi_{v}(\texttt{Reach}^{t-1}(\boldsymbol{\phi},S))=1$ :

[TABLE]

The $\tau$ -steps reachability set of a seed set $S$ is the random variable $\texttt{Reach}^{\tau}(\boldsymbol{\phi},S)$ for $\boldsymbol{\phi}\sim\mathcal{G}$ and respectively the $\tau$ -steps reachability, $\texttt{R}^{\tau}(S)$ , is the random variable that is the number of active nodes $|\texttt{Reach}^{\tau}(\boldsymbol{\phi},S)|$ for $\boldsymbol{\phi}\sim\mathcal{G}$ . Finally, the influence value of $S$ is defined to be the expectation

[TABLE]

We refer to the case where the diffusion is allowed to progress until there is no growth as unrestricted diffusion and this corresponds to $\tau=n-1$ . The influence $\texttt{I}^{\tau}(S)$ is a monotone set function. We say that an SDM is submodular when the influence function is submodular and that it is independent if the activation functions $\phi_{v}$ of different nodes are independent random variables. The IM problem for seed set size $s$ and $\tau$ steps is to find

[TABLE]

The reader might be more familiar with well-studied special cases of this general formulation. Live-edge diffusion models $\mathcal{G}(V,\mathcal{E})$ are specified by a graph $(V,\mathcal{E})$ with $|V|=n$ nodes and $|\mathcal{E}|=m$ directed edges and a distribution $E\sim\mathcal{G}$ over subsets $E\subset\mathcal{E}$ of "live" edges. When expressed as an SDM, the activation functions that correspond to $E$ have $\phi_{v}(T)=1$ if and only if there is an edge from a node in $T$ to $v$ in the graph $(V,E)$ . Live-edge models are always submodular: This because $|\texttt{Reach}^{\tau}(E,S)|$ , which

is the number of nodes reachable from $S$ in $(V,E)$ by paths of length at most $\tau$ , is a coverage function and hence monotone and submodular. Therefore, so is the influence function $\texttt{I}^{\tau}(S)$ , which is an expectation of a distribution over coverage functions. A live-edge model is independent if we only have dependencies between incoming edges to the same node. The Independent Cascade (IC) model is the special case of an independent live-edge model where all edges $e\in\mathcal{E}$ are independent Bernoulli random variables selected with probabilities $p_{e}$ ( $e\in\mathcal{E}$ ).

Another well-studied class are generalized threshold (GT) models [29, 33]. A GT model $\mathcal{G}(V,\boldsymbol{f})$ is specified by a set $\boldsymbol{f}:=\{f_{v}\}_{v\in V}$ of monotone functions $f_{v}:2^{V}\rightarrow[0,1]$ . The randomization is specified by a set of threshold values $\boldsymbol{\theta}\sim\mathcal{G}$ where $\boldsymbol{\theta}:=\{\theta_{v}\}_{v\in V}$ . The corresponding activation functions to $\boldsymbol{\theta}$ are

[TABLE]

A well-studied subclass are Independent GT (IGT) where we require that $\boldsymbol{f}$ are submodular and nodes $v\in V$ have independent threshold values $\theta_{v}\sim U[0,1]$ . Mossel and Roch [33, 34] proved that IGT models are submodular, which is surprising because the functions $|\texttt{Reach}^{\tau}(\boldsymbol{\phi},S)|$ are generally not submodular. Their proof was provided for unrestricted diffusion but extends to the case where we stop the process after $\tau$ steps. Finally, Linear threshold (LT) models [24, 29] are a special case of IGT where we have an underlying directed graph and each edge $(u,v)$ is associated with a fixed weight value $b_{uv}\geq 0$ so that for all $v\in V$ $\sum_{u}b_{uv}\leq 1$ and the functions are defined as the sums $f_{v}(A):=\sum_{u\in A\cap N(v)}b_{uv}$ . Kempe et al showed [29] that each LT model is equivalent to an independent live-edge model.

One of the challenges brought on by the IM formulation is computational efficiency. Kempe et al [29] noted that the IM problem generalizes the classic Max Cover problem even with $\tau=1$ and a live-edge model with a fixed set of live edges ( $p_{e}=1$ for all $e\in\mathcal{E}$ ). Therefore, IM inherits Max Cover’s hardness of approximation for ratio better than $1-(1-1/s)^{s}\geq 1-1/e$ [19] for a cover with $s$ sets. On the positive, with submodular models, an approximation ratio of $1-(1-1/s)^{s}$ can be achieved by the first $s$ nodes of a greedy sequence generated by sequentially adding a node with maximum marginal value [35]. A challenge of applying Greedy with stochastic models, however, is that even point-wise evaluation of the influence function can be computationally intensive. Exact evaluation even for IC models is #P hard [6]. As for approximation, Kempe et al proposed to work with averaging oracles

[TABLE]

that average the reachability values obtained from a set $\{\boldsymbol{\phi}_{i}\}_{i=1}^{\ell}$ of i.i.d. simulations. Recall that in the general SDM formulation, a simulation is specified by a set $\boldsymbol{\phi}$ of node activation functions. For live-edge models, a simulation is simply a set of concurrently live edges $E$ . In GT models, a simulation is specified by a set of thresholds $\boldsymbol{\theta}$ .

The averaging oracle has some appealing properties: First, it is robust compared to estimators tailored to models that satisfy specific assumptions (see related work section) in that for any diffusion model $\mathcal{G}$ , also with complex and unknown dependencies (between activation functions of different nodes or between edges in live-edge models), for any set $S$ , $\hat{{\sf A}}(S)$ is an unbiased estimate of the exact influence value $\texttt{I}^{\tau}(S)$ and estimates are accurate as long as the variance of $\texttt{R}^{\tau}(S)$ is "sufficiently" small. Second, in terms of practicality, the oracle is directly available from simulations and does not require learning or inferring the underlying diffusion model that generated the data [39, 22, 21]. Therefore, the results are not sensitive to modeling assumptions and learning accuracy [8, 25]. Often, estimation of model parameters requires a large number of simulations: Even for IC models, Example 1.2 shows that edges with tiny probabilities that require many simulations to estimate can be critical for IM accuracy. Third, in terms of computation, when the reachability functions $\texttt{Reach}^{\tau}(\boldsymbol{\phi},T)$ are monotone and submodular (as is the case with live-edge models), so is their average $\hat{{\sf A}}$ , and hence the oracle optimum can be approximated by the greedy algorithm. Prior work addressed the efficiency of working with averaging oracles by improving the efficiency of greedy maximization [30, 23] and applied sketches [9] to efficiently estimate $\hat{{\sf A}}(S)$ values [7, 13].

The fundamental question we study here is the sample complexity of IM, that is, the number of i.i.d. simulations needed to recover an approximate maximizer of the influence function $\texttt{I}^{\tau}$ . Formally, for parameters $(\epsilon,\delta)$ , identify a seed set $T$ of size $s$ so that $\Pr\left[\texttt{I}^{\tau}(T)\geq(1-\epsilon)\textsc{OPT}^{\tau}_{s}\right]\geq 1-\delta$ , where $\textsc{OPT}^{\tau}_{s}:=\max_{S\mid|S|\leq s}\texttt{I}^{\tau}(S)$ is the exact maximum. Note that the recovery itself is generally computationally hard and the sample complexity only considers the information we can glean from a set of simulations.

Kempe et al provided an upper bound of

[TABLE]

on the sample complexity of the harder Uniform Relative-Error Estimation (UREE) problem where for a given $(\epsilon,\delta)$ we bound the number of simulations so that with probability $1-\delta$ , for all subsets $S$ such that $|S|\leq s$ , $\hat{{\sf A}}(S)$ approximates $\texttt{I}^{\tau}(S)$ within relative error of $\epsilon$ . The sample complexity of UREE upper bounds that of IM because the oracle maximizer $\arg\max_{S\mid|S|\leq s}\hat{{\sf A}}(S)$ must be an approximate maximizer. We provide the argument for (1) here because it is basic and broadly applies to all SDMs: The reachability values $\texttt{Reach}^{\tau}(\boldsymbol{\phi},S)$ , and hence their expectation, $\texttt{I}^{\tau}(S)$ have values in $[1,n]$ . Using the multiplicative Chernoff bound (with values divided by $n$ ) we obtain that $O(\epsilon^{-2}n\ln\delta^{-1})$ simulations guarantee a relative error of $\epsilon$ with probability at least $(1-\delta)$ for the estimate of any particular set $S$ . Interestingly, this bound is tight for point-wise estimation even for IC models: Example 1.1 shows a family of models where $\tau=2$ and $\Omega(\epsilon^{-2}n)$ simulations are required for estimating the influence value of a single node. The UREE sample complexity bound (1) follows from applying a union bound over all $\binom{n}{s}=O(n^{s})$ subsets.

The generic upper bound has prohibitive linear dependence on the number of nodes $n$ (that Example 1.1 shows is unavoidable for UREE even for IC models). A simple example shows that we can not hope for an umbrella improvement for IM: Consider the star graph family of Example 1.2 when edges are dependent so that either all edges are live or none is. Clearly $n/100$ simulations are necessary to detect a 1-step approximate maximizer (which must be the actual maximizer). The remaining hope is that we can obtain stronger bounds on the IM sample complexity for models with weaker or no dependencies such as the IC and IGT models. This question eluded researchers for nearly two decades.

Contributions and overview

We study the sample complexity of influence maximization from averaging oracles computed from i.i.d. simulations. One of our main contributions is an upper bound of

[TABLE]

on the IM sample complexity of independent strongly submodular SDMs. Informally, strong submodularity means that the influence function of any “reduced” model (model derived from original one by setting a subset $T\subset V$ of nodes as active) is submodular. The IC and IGT models are special cases of strongly submodular independent SDMs.

Interestingly, we provide similar sample complexity bounds for natural families of models that are not independent: Mixtures of small number of strongly submodular SDMs and what we call $b$ -dependence live-edge models that allow for positive dependence of small groups of edges with a shared tail node.

Our bound improves over prior work by replacing the prohibitive linear dependence in the number of nodes $n$ in (1) with the typically much smaller value $\tau$ . While on worst-case instances unrestricted diffusions may require $\Omega(n)$ steps, understanding the sample complexity in terms of $\tau$ is important: First, IM with explicit step limits [5, 31, 20, 17, 14], is studied for applications where activation time matters. Moreover, due to the “small world” phenomenon [42], in “natural” settings we can expect most activations (even with unrestricted diffusions) to occur within a small number of steps. In the latter case, unrestricted influence values are approximated well by corresponding step-limited influence with $\tau\ll n$ .

Our improvement is surprising as generally a linear-in- $n$ number of simulations is necessary for estimating influence values of some nodes or to estimate essential model parameters (for example, the edge probabilities in IC models), and this is the case even when $\tau$ is very small. This shows that the maximization problem is in an information sense inherently easier and can circumvent these barriers.

We overview our results and implications – complete proofs can be found in the appendix. We review related work in Section 2 and place it in the context of our results. In Section 3 we formulate quality measures for influence oracles and relate unrestricted and step-limited influence. In particular, we observe that for IM it suffices that the oracle provides good estimates (within a small relative error) of larger influence values. This allows us to circumvent the lower bound for point-wise relative-error estimates shown in Example 1.1.

In Section 4 we state our main technical result that upper bounds $\mathop{\sf Var}[\texttt{R}^{\tau}(T)]$ by $\tau\texttt{I}^{\tau}(T)\max_{v\in V\setminus T}\texttt{I}^{\tau-1}(v)$ for independent strongly submodular SDMs. This variance upper bound facilitates estimates with small relative error for sets with larger influence values and additive error for sets with small influence values. We also provide a family of IC models that shows that the linear dependence on $\tau$ in the variance bound is necessary. We derive similar variance bounds to mixtures of strongly submodular independent SDMs and $b$ -dependence models. All our subsequent sample complexity bounds apply generically to any SDM (submodular/independent or not) that satisfies variance bounds of this form. In Section 5 we review averaging oracles and bound the sample complexity using variance upper bounds. In section 6 we present our median-of-averages oracle that amplifies the confidence guarantees of the averaging oracle and facilitates a tighter sample complexity bound.

In Section 7 we provide a data-adaptive framework that provides guarantees while avoiding the worst-case sample complexity upper bounds on models when a smaller number of simulations suffices.

In Section 8 we consider computational efficiency and present a greedy maximization algorithm based on our median-of-averages oracles that returns a $(1-(1-1/s)^{s}-\epsilon$ approximate maximizer with probability $1-\delta$ . The design generically applies to any SDM with a submodular influence function that satisfies the variance bounds.

2 Related work

Our focus here is on influence estimates obtained from averages of i.i.d. simulations of a model. We note that alternative approaches can be more effective for specific families of models. In particular, for IC models, state of the art large-scale greedy approximate maximization algorithms [41, 40, 36, 26] are not based on simulation averages. The estimates are also obtained by building for each node a sample of its "influence set" but instead they use a finer building block of i.i.d. Reverse Reachability (RR) searches. The random RR search method was proposed in [9] to estimate size of reachability sets in graphs and Borg et al [3] adapted it to IC models.

The method can be applied in principle for any live-edge model: A basic RR search is conducted by selecting a node $v\in V$ uniformly at random and performing a BFS search on reversed edges that is pruned at length $\tau$ . The search "flips" edges as their head node is reached according to conditional distribution on $\mathcal{G}$ . The index number of the RR search is then added to the sample set of each node that is "reached" by the search. Influence of a subset $S$ can then be unbiasedly estimated from the cardinality of the union of the samples of nodes $v\in S$ and the greedy algorithm can be applied to the sets of samples for approximate maximization. To obtain an approximate influence maximizer we need to perform RR searches until some node has a sample of size $O\left(\epsilon^{-2}s\log(n/\delta)\right)$ . In the worst case, this requires $O\left(\epsilon^{-2}sn\log(n/\delta)\right)$ RR searches. For general live-edge models, an independent RR search can always be obtained from a simulation $E\sim\mathcal{G}$ by randomly drawing a node and performing a reverse search from it using edges $E$ . The same simulation, however, can not generally be reused to generate multiple independent RR searches. This way of obtaining RR searches works for general live-edge models (with arbitrary dependencies) but requires $O(\epsilon^{-2}ns\log(n/\delta))$ simulations, which does not improve over the generic upper bound (1).

The appeal of the RR searches method is that it can be implemented very efficiently for independent live-edge (including IC or LT) models. The total work performed requires only $O(\epsilon^{-2}ms(\log(n/\delta)))$ "edge flips" that can be easily performed using specified edge probabilities $p_{e}$ for IC models. Moreover, the basic building block of RR searches are local simulations of sets of incoming edges of specified nodes and the full computation requires at most $O(\epsilon^{-2}s\log(n/\delta))$ local simulations for each node. When we have full simulations generated by an independent live-edge model these “local” simulations are independent and the required number of "local simulations" can be obtained by decomposing $O(\epsilon^{-2}s\log(n/\delta))$ full simulations. But the caveat is that this approach breaks the coherence of simulations, as we construct each RR search from components taken from multiple simulations. These "efficient" implementations (i.e. based on decomposed simulations or edge flips according to marginal probabilities) may "catastrophically fail" when dependencies exist: The influence estimates obtained are biased and cause large errors even when the variance is low. Example 2.1 shows a simple mixture model (of two degenerate IC models) where "efficient" RRS has large error due to bias but averages of few simulations provide accurate estimates. To summarize, with RRS, the implementation that works with full simulations is robust to dependencies but is inefficient and the efficient implementation breaks ungracefully even with light dependencies. Simulations averages Thus we believe that both basic approaches to approximate IM, simulation averages and RRS offer distinct advantages: Simulation averages are robust in that they remain unbiased and are accurate on any SDM, including dependent ones, for which the variance is sufficiently small whereas RRS offers more efficiency with pure independence live-edge models.

3 Preliminaries

We consider stochastic diffusion models $\mathcal{G}(V)$ as outlined in the introduction. We denote by $\texttt{Reach}^{\tau}(\boldsymbol{\phi},T)$ the $\tau$ -steps reachability set of $T$ when we use a specific set $\boldsymbol{\phi}$ of activation functions. We will use the notation $\texttt{Reach}^{\tau}(T)$ (with the parameter $\boldsymbol{\phi}$ omitted) for the random variable $\texttt{Reach}^{\tau}(\boldsymbol{\phi},T)$ obtained when we draw $\boldsymbol{\phi}\sim\mathcal{G}$ according to the model.

Utility functions

For simplicity, the discussion in the introduction took the utility of a reachable set to be the number of reachable nodes $\texttt{VReach}^{\tau}(\boldsymbol{\phi},T):=|\texttt{Reach}^{\tau}(\boldsymbol{\phi},T)|$ . Generally, we can consider utility functions $H:2^{V}\rightarrow\Re_{+}$ that are nonnegative monotone non-decreasing with $H(\emptyset)=0$ :

[TABLE]

Submodular utility is particularly natural and studied by Mossel and Roch [33]. Additive utility is the special case where nodes have nonnegative weights $w:V\rightarrow\mathcal{R}^{+}$ and

[TABLE]

We consider a diffusion model $\mathcal{G}(V,H)$ together with a utility function $H$ . The random variable $\texttt{R}_{\mathcal{G}}^{\tau}(T)$ is the utility of the reachable set, that is, $\texttt{VReach}^{\tau}(\boldsymbol{\phi},T)$ when $\boldsymbol{\phi}\sim\mathcal{G}$ . The influence function is then the expected utility of the reachable set

[TABLE]

We denote the maximum influence value of a subset of cardinality $s$ by $\textsc{OPT}^{\tau}_{s}:=\max_{S:|S|\leq s}\texttt{I}^{\tau}(S)$ .

It follows from the definition that for any SDM $\mathcal{G}(V,H)$ with utility $H$ , the influence $\texttt{I}^{\tau}(T)$ is monotone non-decreasing in $\tau$ and in the set $T$ and the optimum values $\textsc{OPT}^{\tau}_{s}$ are non-decreasing in $\tau$ and $s$ . Generally, influence functions $\texttt{I}^{\tau}(T)$ of SDMs may not be submodular even when utility is additive. The influence function is submodular for live-edge and for IGT models [33] with submodular utility.

Reduced models

We work with the following notion of model reduction. Let $\mathcal{G}(V,H)$ be an independent SDM with submodular utility. For a set of nodes $T\subset V$ , we define the reduced model $\mathcal{G}^{\prime}(V^{\prime},H^{\prime})$ of $\mathcal{G}$ with respect to $T$ : The reduced model contains the nodes $V^{\prime}=V\setminus T$ . The activation function $\phi^{\prime}_{v}\sim\mathcal{G}^{\prime}$ for $v\in V\setminus T$ are obtained by drawing $\phi_{v}\sim\mathcal{G}$ conditioned on $\phi_{v}(T)=0$ and take

[TABLE]

(Note that since we have independent SDM we can separately consider the distribution of activation functions of each node). The utility in $\mathcal{G}^{\prime}$ is the marginal utility in $\mathcal{G}$ with respect to $T$ :

[TABLE]

The reduced model $\mathcal{G}^{\prime}(V^{\prime},H^{\prime})$ is also an independent SDM with submodular utility: Activations functions $\boldsymbol{\phi}^{\prime}\sim\mathcal{G}^{\prime}$ are independent and monotone and the utlity is monotone with $H^{\prime}(\emptyset)=0$ and submodular.

Strongly submodular SDM

We say that an independent SDM $\mathcal{G}(V,H)$ is strongly submodular if the utility function $H$ is submodular and the influence function $\texttt{I}^{\tau}_{\mathcal{G}^{\prime}}$ is submodular with any reduced model $\mathcal{G}^{\prime}$ and step limit $\tau\geq 0$ . IC and IGT models are strongly submodular SDMs (see Theorem A.2).

The variance and thus sample complexity upper bounds that we present in the sequel apply to any strongly submodular SDM. We will also provide bounds for some dependent families of models. One family is a slight generalization of IC models that we refer to as $b$ -dependence. Here edges are partitioned into disjoint groups, where each group contains at most $b$ edges emanating from the same node. The edges in a group must be either all live or none live (are positively dependent).

3.1 Relating step-limited and unrestricted Influence

When unrestricted diffusion from a seed set $S$ is such that most activations occur within $\tau$ steps, the unrestricted influence $\texttt{I}(S)$ is approximated well by $\tau$ -step influence $\texttt{I}^{\tau}(S)$ . We can also relate unrestricted influence with small expected steps-to-activation to step-limited influence: For a seed set $S$ , node $v$ , and length $d$ , we denote by $p(S,v,d)$ the probability that node $v$ is activated in a diffusion from $S$ in step $d$ . For additive utility functions (4) by definition, $\texttt{I}^{\tau}(S)=\sum_{v\in V}w(v)\sum_{d\leq\tau}p(S,v,d)$ . The expected length of an activation path from $S$ (in unrestricted diffusion) is:

[TABLE]

The following lemma is an immediate consequence of Markov’s inequality and shows that $\tau$ -stepped influence with $\tau=O(\overline{D}(S))$ approximates well the unrestricted influence:

Lemma 3.1.

For all $S$ and $\epsilon>0$ , $\texttt{I}^{\overline{D}(S)\epsilon^{-1}}(S)\geq(1-\epsilon)\texttt{I}(S)$ .

3.2 Influence Oracles

We say that a set function $\hat{F}$ is an $\epsilon$ -approximation of another set function $F$ at a point $T$ if $\left|\hat{F}(T)-F(T)\right|\leq\epsilon\max\{F(T),\textsc{OPT}_{1}(F)\}$ , where $\textsc{OPT}_{s}(F):=\max_{S\mid|S|\leq s}F(S)$ . That is, the estimate $\hat{F}$ has a small relative error for sets $T$ with $F(T)\geq\textsc{OPT}_{1}(F)$ and a small absolute error of $\epsilon\textsc{OPT}_{1}(F)$ for sets $T$ with $F(T)\leq\textsc{OPT}_{1}(F)$ . We say that $\hat{F}$ provides a uniform $\epsilon$ -approximation for all subsets $T$ in a collection $C$ if $\hat{F}$ is an $\epsilon$ -approximation for all $T\in C$ .

An influence oracle, $\hat{\texttt{I}}^{\tau}$ , is a randomized data structure that is constructed from a set of i.i.d. simulations of a model. The influence oracle, $\hat{\texttt{I}}^{\tau}$ , defines a set function (we use the same name $\hat{\texttt{I}}^{\tau}$ for the set function) that for any input query set $T\subset V$ , returns a value $\hat{\texttt{I}}^{\tau}(T)$ . For $\epsilon<1$ and $\delta<1$ we say that an oracle provides $(\epsilon,\delta)$ approximation guarantees with respect to $\texttt{I}^{\tau}$ if for any set $T$ it is an $\epsilon$ -approximation with probability at least $1-\delta$ . That is

[TABLE]

where $\textsc{OPT}^{\tau}_{1}:=\textsc{OPT}_{1}({I}^{\tau})$ . Example 1.1 shows that this type of requirement is what we can hope for with an oracle that is constructed from a small number of simulations.

The $(\epsilon,\delta)$ requirements are for each particular set $T$ . If we are interested in stronger guarantees that with probability $(1-\delta)$ the approximation uniformly holds for all sets in a collection $\mathcal{C}$ , we can use an oracle that provides $(\epsilon,\delta_{A}=\delta/|\mathcal{C}|)$ guarantees. The $\epsilon$ -approximation guarantee for all sets in $\mathcal{C}$ then follow using a union bound argument: The probability that all $|\mathcal{C}|$ sets are approximated correctly is at most $|\mathcal{C}|\delta_{A}\leq\delta$ .

4 Variance Bounds

We consider upper bounds on the variance $\mathop{\sf Var}\left[\texttt{R}^{\tau}(T)\right]$ of the reachability of a set of nodes $T$ that have the following particular form

[TABLE]

for some $c\geq 1$ . The sample complexity bounds we present in the sequel apply to any SDM that satisfies these bounds. In the remaining part of this section we state our variance upper bound for strongly submodular SDMs and extensions and a tight worst-case lower bound for IC models.

4.1 Variance upper bound

The following key theorem facilitates our main results. We show that any strongly submodular SDM satisfy the bound (8) with $c=\tau$ . The proof is technical and provided in Appendix A.

Theorem 4.1 (Variance Upper Bound

Lemma).

Let $\mathcal{G}(V,H)$ be a strongly submodular SDM. Then for any step limit $\tau\geq 0$ , and a set $T\subset V$ of nodes we have

[TABLE]

Some natural dependent SDMs have a variance bound of the form (8): (See Appendix F for proofs.)

Corollary 4.1.

IC models with $\tau$ -steps and $b$ -dependence satisfy the bound (8) with $c=2b\tau$ . Any mixture of $\tau$ -steps strongly submodular SDMs where each model has probability at least $p$ satisfy the bound (8) with $c=(\tau+1)/p$ .

4.2 Variance lower bound

We provide a family of IC models for which this variance upper bound is asymptotically tight. This shows that the dependence of the variance bound on $\tau$ is necessary.

Theorem 4.2 (Variance Lower Bound).

For any $\tau>0$ there is an IC model $\mathcal{G}^{\tau}=(V,\mathcal{E})$ with a node $v\in V$ of maximum influence such that $\mathop{\sf Var}[\texttt{R}^{\tau}(v)]\geq\frac{1}{12}\tau\texttt{I}^{\tau}(v)^{2}$

Our family of models $\mathcal{G}^{\tau}=(V,\mathcal{E})$ are such that $(V,\mathcal{E})$ is a complete directed binary tree of depth $\tau\geq 1$ rooted at $v\in V$ with all edges directed away from the root and $p_{e}=1/2$ for all $e\in\mathcal{E}$ . We show (details in Appendix B) that:

[TABLE]

5 The Averaging Oracle

The averaging oracle uses i.i.d. simulations $\{\boldsymbol{\phi}_{i}\}_{i=1}^{\ell}$ . For a query $T$ it returns the average utility of the reachability set of $T$ : $\hat{{\sf A}}^{\tau}(T)=\mathop{{\texttt{{\sf Ave}}}}_{i\in[\ell]}\texttt{VReach}^{\tau}(\boldsymbol{\phi}_{i},T):=\frac{1}{\ell}\sum_{i=1}^{\ell}\texttt{VReach}^{\tau}(\boldsymbol{\phi}_{i},T)\ .$ We quantify the approximation guarantees of an averaging oracle in terms of a variance bound of the form (8).

Lemma 5.1.

Consider an SDM that for some $c\geq 1$ satisfies a variance bound of the form (8). Then for any $\epsilon,\delta<1$ , an averaging oracle constructed from $\ell\geq\epsilon^{-2}\delta^{-1}c$ i.i.d. simulations provides $(\epsilon,\delta)$ guarantees.

In particular for strongly submodular SDMs, we use the variance bound in Theorem 4.1 and obtain these approximation guarantees using $\ell\geq\epsilon^{-2}\delta^{-1}\tau$ i.i.d. simulations.

Proof.

Using variance properties of the average of i.i.d. random variables, we get that for any query $T$

[TABLE]

The claims follow using Chebyshev’s inequality that states that for any random variable $X$ and $M$ , $\Pr[|X-\textsc{E}[X]|\geq\epsilon M]\leq\epsilon^{-2}\mathop{\sf Var}[X]/M^{2}$ . We apply it to the random variable $\hat{{\sf A}}^{\tau}(T)$ that has expectation $I^{\tau}(T)$ and plug in the variance bound. To establish (6) we use $M=\texttt{I}^{\tau}(T)$ and to establish (7) we use $M=\textsc{OPT}^{\tau}_{1}$ . ∎

5.1 Sketched averaging oracle

For live-edge models with additive utility (4), the query efficiency of the averaging oracle can be improved with off-the-shelf use of $\tau$ -step combined reachability sketches [9, 13, 14, 10]. The sketching is according to a sketch-size parameter $k$ that also determines the sketches computation time and accuracy of the estimates that sketches provide. A sketch of size $O(k)$ is computed for each node $v$ so that for any set of nodes $S$ , $\sum_{i=1}^{r}\texttt{VReach}^{\tau}(E_{i},S)$ can be efficiently estimated from the sketches of the nodes $v\in S$ . The computation of the sketches from an arbitrary set of simulations $\{E_{i}\}$ uses at most $\sum_{i}|E_{i}|+k\sum_{v}\max_{i}d_{v}(E_{i})$ edge traversals, where $d_{v}(E_{i})$ is the maximum in-degree of node $v$ over simulations $\{E_{i}\}$ . In the case of an IC model, the expected number of traversals is $(k+\ell)\sum_{e}p_{e}$ . Sketching with general node weights can be handled as in [10]. The estimates obtained from the sketches are unbiased with coefficient of variation $1/\sqrt{k-2}$ and are concentrated: Sketches of size $k=O(\epsilon^{-2}\log(\delta^{-1}))$ provide estimates with relative error $\epsilon$ with probability $1-\delta$ .

6 Confidence Amplification: The median-of-averages oracle

The statistical guarantees we provide for our averaging oracle are derived from variance bounds. The limitation is that the number of simulations we need to provide $(\epsilon,\delta)$ guarantees is linear in $\delta^{-1}$ and therefore the number of simulations we need to provide uniform guarantees (via a union bound argument) grows linearly with the number of subsets.

In order to find an approximate optimizer, we would like to have a uniform $\epsilon$ -approximation for all the ${n\choose s}$ subsets of size at most $s$ but doing so with an averaging oracle would require too many simulations. We adapt to our setting a classic confidence amplification technique [1] to construct an oracle where the number of simulations grows logarithmically in the confidence parameter $\delta^{-1}$ .

A median-of-averages oracle is specified by a number $r$ of pools with $\ell$ simulations in each pool. The oracle is therefore constructed from $r\ell$ i.i.d. simulations $\boldsymbol{\phi}_{ij}$ for $i\in[r]$ and $j\in[\ell]$ .

The simulations of each pool are used in an averaging oracle that for the $i$ th pool ( $i\in[r]$ ) returns the estimates $\hat{{\sf A}}^{\tau}_{i}(T)$ . The median-of-averages oracle returns the median value of these $r$ estimates

[TABLE]

We establish that when the i.i.d simulations are from a model that has variance bound (8) for some $c\geq 1$ , the median-of-averages oracle provides $(\epsilon,\delta)$ approximation guarantees using $112\epsilon^{-2}c\ln\delta^{-1}$ i.i.d. simulations.

Lemma 6.1.

Consider an SDM that for some $c\geq 1$ satisfies the variance bound (8). Then for every $\epsilon$ and $\delta$ , a median-average oracle $\widehat{{\sf mA}}$ organized with $r=28\ln\delta^{-1}$ pools of $\ell=4\epsilon^{-2}c$ simulations in each provides $(\epsilon,\delta)$ approximation guarantees.

Proof.

An averaging oracle with $\ell$ simulations provides $(\epsilon,\delta_{A})$ approximation guarantees for $\delta_{A}=1/4$ . Therefore, the probability of correct estimate for any subset is at least $3/4$ . We now consider the estimates $\hat{{\sf A}}_{j}$ obtained from the $r$ pools when sorted in increasing order. The estimates that are not correct (too low or too high) will be at the prefix and suffix of the sorted order. The expected number of correct estimates is at least $\mu\geq\frac{3}{4}r$ . The probability that the median estimate is not correct is bounded by the probability that number of correct estimates is $\leq r/2$ , which is $\leq\frac{2}{3}\mu$ . From multiplicative Chernoff bounds, the probability of a sum of Bernoulli random variables beings below $(1-\epsilon^{\prime})\mu$ is at most $e^{-\epsilon^{\prime 2}\mu/(2+\epsilon^{\prime})}$ . Using $\epsilon^{\prime}=1/3$ we have $\epsilon^{\prime 2}\mu/(2+\epsilon^{\prime})=\frac{1}{9}\frac{3}{4}\frac{3}{7}28\ln\delta^{-1}=\ln\delta^{-1}$ . ∎

As a corollary, we obtain a sample complexity bound for influence maximization from variance bounds:

Theorem 6.1.

Consider an SDM that satisfies the variance bound (8) for some $c\geq 1$ . Then for any $\epsilon<1$ and $\delta<1$ , using $112\epsilon^{-2}cs\ln\frac{n}{\delta}$ i.i.d. simulations we can return $T$ such that

[TABLE]

Proof.

We construct a median-of-averages oracle with $\ell=4\epsilon^{-2}c$ and $r=28\ln\delta_{MA}^{-1}$ where $\delta_{MA}=\delta/{n\choose s}$ . From Lemma 6.1 using a union bound over the ${n\choose s}$ sets we obtain that with probability $1-\delta$ the oracle provides a uniform $\epsilon$ -approximation for all subsets of size at most $s$ . Let $S$ be a set with maximum influence $\texttt{I}(S)=\textsc{OPT}^{\tau}_{s}$ and let $T$ be the oracle optimum

[TABLE]

We have

[TABLE]

We comment that the $(1-2\epsilon)$ ratio is not tight and we can obtain a bound closer to $(1-\epsilon)$ . This because the particular set $S$ to be approximated more tightly by the oracle (that uses enough simulations to support a union bound).

∎

7 Optimization with Adaptive sample size

The bound on the number of simulations we derived in Theorem 6.1 (through a median-of-averages oracle) and also the naive bound (1) (for the averaging oracle) are worst-case. This is obtained by using enough simulations to have the oracle provide a uniform $\epsilon$ -approximation with probability at least $1-\delta$ on any problem instance. To obtain the uniform approximation we applied a union bound over ${n\choose s}$ subsets that resulted in an increase in the number of required simulations by an $s\log n$ factor over the base $(\epsilon,\delta)$ approximation guarantees.

On real data sets a much smaller number of simulations than this worst-case often suffices. We are interested in algorithms that adapt to such data and return a seed set of approximate maximum influence using a respectively smaller number of simulations and while providing statistical guarantees on the quality of the end result. To do so, we apply an adaptive optimization framework [11] (some example applications are [13, 36, 15, 12]). This framework consists of a “wrapper” that take as inputs oracle constructions from simulations and a base algorithm that performs an optimization over an oracle. The wrapper invokes the algorithm on oracles constructed using an increasing number of simulations until a validation condition on the quality of the result is met. The details are provided in Appendix E. We denote by $r(\epsilon,\delta)$ the number of simulations that provides $(\epsilon,\delta)$ guarantees and we obtain the following results:

Theorem 7.1.

Suppose that on our data the averaging (respectively, median-of-averages) oracle $\hat{I}$ has the property that with $r$ simulations, with probability at least $1-\delta$ , the oracle optimum $T:=\arg\max_{S\mid|S|\leq s}\hat{I}(S)$ satisfies

[TABLE]

Then with probability at least $1-5\delta$ , when using $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln s\right)\right)\right)$ simulations with the median-of-averages oracle and $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln n\right)\right)\right)$ simulations with the averaging oracle, the wrapper outputs a set $T$ such that $\texttt{I}^{\tau}(T)\geq(1-5\epsilon)\textsc{OPT}_{s}^{\tau}$ .

The wrapper can also be used with a base algorithm that is an approximation algorithm. For live-edge models, our averaging oracle is monotone and submodular and hence we can apply greedy to efficiently compute a set with approximation ratio at least $1-1/e$ (with respect to the oracle). If we use greedy as our base algorithm we obtain the following:

Theorem 7.2.

If the averaging oracle $\hat{{\sf A}}$ is submodular and has the property that with $\geq r$ simulations, with probability at least $1-\delta$ , it provides a uniform $\epsilon$ -approximation for all subsets of size at most $s$ , then with $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln n\right)\right)\right)$ simulations we can find in polynomial time a $(1-(1-1/s)^{s})(1-5\epsilon)$ approximate solution with confidence $1-5\delta$ .

8 Approximate Greedy Maximization

In this section we consider the computational efficiency of maximization over our oracle $\hat{\texttt{I}}$ that approximates a monotone submodular influence function $\texttt{I}^{\tau}$ . The maximization problem is computationally hard: The brute force method evaluates $\hat{\texttt{I}}(S)$ on all $\binom{n}{s}$ subsets $S$ of size $s$ in order to find the oracle maximizer. An efficient algorithm for approximate maximization of a monotone submodular function $\hat{F}$ is greedy that sequentially builds a seed set $S$ by adding a node $u$ with maximum marginal contribution $\arg\max_{u\in V}(\hat{F}(S\cup\{u\})-\hat{F}(S))$ at each step. To implement greedy we only need to evaluate at each step the function on a linear number of subsets $\hat{F}(S\cup\{u\})$ for $u\in V$ and thus overall we do $sn$ evaluations of $\hat{F}$ on subsets. With a monotone and submodular $\hat{F}$ , for any $s\geq 1$ the subset $T$ that consists of the first $s$ nodes in a greedy sequence satisfies [35]:

[TABLE]

If our functions $\hat{F}$ provides a uniform $\epsilon$ -approximation of another function $F$ for all subsets of size at most $s$ , then $F(T)\geq(1-(1-1/s)^{s})(1-2\epsilon)\textsc{OPT}_{s}(F)$ (See the proof of 6.1).

The averaging oracle is monotone and submodular [29] when reachability functions are as in live-edge models. Unfortunately our median-of-averages oracle which facilitates tighter bounds on the number of simulations is monotone but may not be submodular even for models where the averaging oracle is submodular. Generally when this is the case, greedy may fail (as highlighted in recent work by Balkanski et al [2]).

Fortunately, greedy is effective on a function $\hat{F}$ that is monotone but not necessarily submodular as long as $\hat{F}$ "closely approximates" a monotone submodular $F$ in that marginal contributions of the form

[TABLE]

are approximated well by $\hat{F}(u\mid S)$ [13]. We apply this to establish the following lemma:

Lemma 8.1.

*The greedy algorithm applied to a function $\hat{F}$ that is monotone and provides a uniform $\epsilon_{A}$ -approximation of a monotone submodular function $F$ where $\epsilon_{A}=\frac{\epsilon(1-\epsilon)}{14s}$ returns a set $T$ such that $F(T)\geq(1-(1-1/s)^{s})(1-\epsilon)\textsc{OPT}_{s}(F)$ . *

Our proof of Lemma 8.1 generally applies to an approximate oracle $\hat{F}$ of any monotone submodular function $F$ and is presented in Appendix C. For approximate IM we obtain the following as a corollary:

Theorem 8.1.

Consider a submodular SDM ${\mathcal{G}}(V,H)$ that for some $c\geq 1$ satisfies the variance bound (8). Consider a median-of-averages oracle constructed with $O(\epsilon^{-2}s^{3}c\ln\frac{n}{\delta})$ simulations of $\mathcal{G}$ arranged as $r=O(s\ln\frac{n}{\delta})$ pools with $\ell=O(\epsilon^{-2}s^{2}c)$ simulations each. Then with probability $1-\delta$ , the set $T$ that contains the first $s$ nodes returned by greedy on the oracle satisfies $\texttt{I}^{\tau}(T)\geq(1-(1-1/s)^{s})(1-\epsilon)\textsc{OPT}^{\tau}_{s}$ .

Proof.

From Lemma 6.1, with appropriate constants, this configuration provides us with $(\epsilon/(14s),\delta)$ approximation guarantees. From Lemma 8.1 greedy provides the stated approximation ratio. ∎

Greedy on the median-of-averages oracle can be implemented generically for any SDM $\mathcal{G}$ by explicitly maintaining the reachability sets $\texttt{Reach}(\boldsymbol{\phi}_{ij},\{v\}\cup S)$ for all nodes $v\in V$ in each simulation $\boldsymbol{\phi}_{ij}$ as the greedy selects nodes into the seed set $S$ . For each step, we compute the oracle value (see (9)) and select $v$ for which the value for $\{v\}\cup S$ is maximized:

[TABLE]

We obtain approximation guarantees, however, only when the conditions of monotone submodular influence function and variance bounds are satisfied. For specific families of models, we can consider tailored efficient implementations that incrementally maintain reachability sets and values.

For live-edge models with additive utility (4) we consider an implementation of greedy on a median-of-averages oracle. This can be done by explicit maintenance of reachability sets or by using sketches [9, 13, 14, 10] (see Section 5.1). We obtain the following bounds (proof is deferred to Appendix Section D)

Theorem 8.2.

Let $\mathcal{G}$ be a live-edge model with an additive utility function (4) that satifies the variance bound (8). Then greedy on median-of-averages oracle can be implemented with explicit reachability sets in time

[TABLE]

where $\overline{m}$ is the average number of edges per simulation (For an IC model, $c=\tau$ and $\textsc{E}[\overline{m}]=\sum_{e\in\mathcal{E}}p_{e}$ ). When using sketches, the time bound is

[TABLE]

where $m^{*}=\sum_{v}\max_{ij}d_{v}(E_{ij})$ . For an IC model, $c=\tau$ and $m^{*}=\sum_{e}p_{e}$ in expectation.

Conclusion

We explore the "sample complexity" of IM on stochastic diffusion models and show that an approximate maximizer (within a small relative error) can be recovered from a small number of simulations as long as the variance is appropriately bounded. We establish the variance bound for the large class of strongly submodular stochastic diffusion models. This includes IC models (where edges are drawn independently) and IGT models (where node thresholds are drawn independently) and natural extensions that allow for some dependencies. Our sample complexity bound significantly improves over the previous bounds by replacing the linear dependence in the number of nodes by a logarithmic dependence on the number of nodes and linear dependence on the length of the activation paths (which are usually very short). An interesting question for future work is to address the gap between the sample complexity and the larger number of simulations currently needed for greedy maximization.

Acknowledgements

This research is partially supported by the Israel Science Foundation (Grant No. 1841/14).

Appendix A Variance upper bound: Proof of Theorem 4.1

In this section we prove Theorem 4.1 which upper bounds the variance in strongly submodular SDMs. We start by bounding the variance in a more basic setting of a submodular function over a random subset in Section A.1 (Theorem A.1). This will be an ingredient in our main proof provided in Section A.3.

We will be using the following basic tools:

Lemma A.1.

If $X,Y$ are two random variables on the same probability space and the variance of $Y$ is finite, then:

[TABLE]

where $\textsc{E}[Y|X]$ is a random variable that gets the expectation of $Y$ conditioned the value of $X$ and $\mathop{\sf Var}[Y|X]$ is a random variable that gets the variance of $Y$ conditioned the value of $X$ .

When $X$ is a Bernoulli random variable $X\sim Ber(p)$ then Lemma A.1 gives that

[TABLE]

where $\textsc{V}_{0}=\mathop{\sf Var}[Y|X=0]$ , $\textsc{V}_{1}=\mathop{\sf Var}[Y|X=1]$ , $\textsc{E}_{0}=\textsc{E}[Y|X=0]$ , and $\textsc{E}_{1}=\textsc{E}[Y|X=1]$ .

A.1 Submodular monotone functions on random subsets

Let $S=\{a_{i}\}_{1<i\leq t}$ be a set with $t$ elements and let $P=\{p_{i}\}_{1<i\leq t}$ be a set of $t$ probabilities such that $p_{i}$ is associated with the element $a_{i}$ . Let $X$ be a random subset of $S$ that contains $a_{i}$ with probability $p_{i}$ independently for each $i=1,...,t$ . That is

[TABLE]

We say that $X$ is a random subset of $S$ using probabilities $P$ .

A submodular monotone function $f$ over $S$ is a function with the following properties:

$f:2^{S}\rightarrow R^{+}$ 2. 2.

For every $A,B\subset S$ with $A\subset B$ and for every $x\in S\setminus B$ we have that $f(A\cup x)-f(A)\geq f(B\cup x)-f(B)$ 3. 3.

$A\subset B\Rightarrow f(A)\leq f(B)$

For any singelton $\{a\}\in S$ we write $f(a)$ instead of $f(\{a\})$ . Let $\texttt{M}_{f}=\max_{i}{f(a_{i})-f(\emptyset)}$ . Our purpose in this subsection is to establish the following:

Theorem A.1.

Let $X$ be a random subset of $S$ using probabilities $P$ and let $f$ be a submodular monotone function. Then

[TABLE]

We give the following additional definitions and lemmas before proving this theorem.

Let $S_{-i}=S\setminus\{a_{i}\}$ and let $P_{-i}=P\setminus\{p_{i}\}$ . We define $X_{-i}$ to be a random subset of $S_{-i}$ using the probabilities $P_{-i}$ . Let $f^{0}_{i},f^{1}_{i}$ be submodular functions over $S_{-i}$ defined by $f^{0}_{i}(A)=f(A)$ and $f^{1}_{i}(A)=f(A\cup\{a_{i}\})$ . Let

[TABLE]

and

[TABLE]

By our definitions $\textsc{E}[f(X_{-i})]=\textsc{E}_{i}^{0}$ and from total expectation (Lemma A.1), $\textsc{E}[f(X)]=p_{i}\textsc{E}_{i}^{1}+(1-p_{i})\textsc{E}_{i}^{0}$ .

Lemma A.2.

let $f$ be a submodular monotone function over $S$ and $X$ a random subset of $S$ using probabilities $P$ . Then,

[TABLE]

Proof.

Since $X$ is obtained by drawing the elements in $S$ independently it follows that

[TABLE]

∎

Lemma A.3.

for any submodular monotone function $f$ over $S$ and for any index $i$ we have that $\texttt{M}_{f^{0}_{i}}\leq\texttt{M}_{f}$ and $\texttt{M}_{f^{1}_{i}}\leq\texttt{M}_{f}$ .

Proof.

The first inequality follows immediately from our definition since

[TABLE]

For the second inequality we use submodularity as follows

[TABLE]

∎

We are now ready for the proof of Theorem A.1.

Proof.

(of Theorem A.1) The proof is by induction on the size of $S$ .

Base case: Let $S=\{a_{1}\},P=\{p_{1}\}$ we have that

[TABLE]

and

[TABLE]

It is left to prove that $\textsc{E}\left[f(X)-f(\emptyset)\right]\geq p_{1}(1-p_{1})\big{[}f(a_{1})-f(\emptyset)\big{]}$ , and indeed we have that

[TABLE]

Inductive Step: Assume the lemma holds for sets of size $\ell$ and any submodular function $f$ and probabilities $P$ . For a set $S$ with $\ell+1$ elements and a submodular function $f$ over $S$ . Let $j\leq i$ be an arbitrary index.

From the total variance formula in Lemma A.1 we know that

[TABLE]

where $\textsc{E}_{j}^{1}=\textsc{E}[f^{1}_{j}(X_{-j})]$ , $\textsc{E}_{j}^{0}=\textsc{E}[f^{0}_{j}(X_{-j})]$ , $\textsc{V}_{j}^{1}=\mathop{\sf Var}[f^{1}_{j}(X_{-j})]$ , and $\textsc{V}_{j}^{0}=\mathop{\sf Var}[f^{0}_{j}(X_{-j})]$ .

By applying the induction hypothesis to $S_{-j}$ with probabilities $P_{-j}$ and $|S_{-j}|=\ell$ and $f_{j}^{0}$ and $f_{j}^{1}$ we get that

[TABLE]

and

[TABLE]

Substituting these bounds in Equation (12) we get that

[TABLE]

∎

A.2 Properties of reduced diffusion models

We establish some properties of reduced independent SDMs that are needed for our upper bound.

We first show that influence values of nodes in a reduced model can only be lower than respective values in the original model:

Lemma A.4.

Let $\mathcal{G}^{\prime}(V\setminus T,H^{\prime})$ be a reduction of a model $\mathcal{G}(V,H)$ .

[TABLE]

Proof.

Note that $\mathcal{G}^{\prime}$ is obtained from $\mathcal{G}$ by removing nodes. Therefore respective reachability sets given $\boldsymbol{\phi}$ are such that those in $\mathcal{G}^{\prime}$ can only be subsets of those in $\mathcal{G}$ :

[TABLE]

Then from monotonicity and submodularity of $H$ we get

[TABLE]

(second inequality follows from monotonicity and submodularity of $H$ so that for all $A\subset V\setminus T$ , $H^{\prime}(A)\leq H(A)$ .) Therefore,

[TABLE]

∎

A convenient property is that reduction preserves strong monotone submodularity:

Lemma A.5.

A reduction of a strongly monotone submodular model is also strongly monotone submodular.

Proof.

A reduced model with respsect to $T_{2}$ of a reduced model of $\mathcal{G}$ with respect to $T_{1}$ is a reduced model of $\mathcal{G}$ with respect to $T_{1}\cup T_{2}$ . Also note that the reduced utility function $H^{\prime}$ is also monotone and submodular. ∎

We next show that IC or IGT models with submodular utility are closed under reduction:

Theorem A.2.

IC and IGT models with submodular utility are strongly submodular SDMs.

Proof.

We first show that IC/IGT models are independent SDMs. In the introduction we expressed IC and IGT models as SDMs: A live-edge model is expressed as an SDM using $\phi_{v}(T)=1$ if and only if there is an edge from a node in $T$ to $v$ . The model is independent if for all $v$ the edges incoming to $v$ are independent of all other edges. In IC models all edges are independent and hence IC models are independent SDMs. Recall (from the Introduction) that an IGT model is expressed as an SDM using $\phi_{v}(T):=\text{Indicator}(\theta_{v}\leq f_{v}(T))$ . In an IGT model the thresholds $\theta_{v}$ are independent random variables, and hence $\phi_{v}$ are independent. Hence, an IGT model is an independent SDM. Submodularity of influence when utlity is submodular is established for IC models in [29] and for IGT models in [33].

Reduction of any model preserves submodularity of the utility and in particular this holds for reduced IC/IGT models. What remains to show is that a reduced IC/IGT model is also an IC/IGT model (respectively). This would conclude the proof of strong submodularity since any IC and IGT models with submodular utility has a submodular influence functions.

To establish this remaining claim we consider IC/IGT models and express the reduction in terms of the activation functions as one in terms of the respective family of models.

We first consider IC models. The reduced IC model $\mathcal{G}^{\prime}(V\setminus T,\mathcal{E}\setminus(V\times T\cup T\times V)$ is obtained from $\mathcal{G}(V)$ by deleting the nodes $T$ and their incident edges and keeping $p_{e}$ on remaining edges. This is clearly an IC model. It remains to show that this is equivalent to the reduction of the distribution of activation functions. The conditioning that $\phi_{v}(T)=0$ is equivalent to live-edge set $E$ with no edges from $T$ to $V$ . For such edge set for any $S\subset V\setminus(T\cup\{v\})$ we have $\phi^{\prime}_{v}(S)=\phi_{v}(S\cup T)=\phi_{v}(S)$ which corresponds to $E$ having at least one edge from $S$ to $v$ . From independence of edges, the conditional distribution is also independent and retains the same inclusion probabilities.

We next consider IGT models. The reduction $\mathcal{G}^{\prime}(V\setminus T,\{f^{\prime}_{v}\})$ in terms of activation functions distribution is equivalent to functions is equivalent to modifying the functions so that

[TABLE]

The reduced model is clearly an IGT model. The conditioning that $\phi_{v}(T)=0$ means that $\theta_{v}>f_{v}(T)$ . Therefore, the conditional distribution of $\theta_{v}$ provided it was not activated in the first step is uniform on $[f_{v}(T),1]$ . The probability that $\theta_{v}>f_{v}(S\cup T)$ given this conditioning is equal to the probability that $\theta^{\prime}_{v}>f_{v}(S\cup T)-f_{v}(T)=f^{\prime}_{v}(S)$ . ∎

A.3 Upper bound on the variance in strongly

submodular SDM

Let $\mathcal{G}(V,H)$ be a $\tau$ -stepped diffusion model. We denote by $\texttt{M}^{\tau}_{\mathcal{G}}(\bar{T})$ the maximum influence of a single node in $\mathcal{G}$ that is not included in $T$ :

[TABLE]

As before, we omit $\mathcal{G}$ if it can be understood from the context. We prove the following theorem which is a restatement of Theorem 4.1.

Theorem A.3.

Let $\mathcal{G}(V,H)$ be a strongly submodular SDM. Then for any $\tau\geq 0$ and a set of nodes $T\subset V$ :

[TABLE]

The remaining part of this Subsection contains the proof of the Theorem.

Let $T$ be a set of nodes, and let

[TABLE]

be the nodes that have nonzero probability to be activated if $T$ is active. For the special case of IC models, $N(T)=\left\{v\notin T\mid\exists(u,v)\in\mathcal{E},u\in T\right\}$ is the set of outgoing neighbors of $T$ .

We first consider the case where $N(T)$ is empty. In this case, $\texttt{Reach}^{\tau}(T)=T$ for all $\tau\geq 0$ . Therefore, $\mathop{\sf Var}[\texttt{R}^{\tau}(T)]=0$ , $\texttt{I}^{\tau}(T)=H(T)\geq 0$ , and $\texttt{M}^{\tau-1}(\bar{T})=0$ and the claim holds.

We now assume that $N(T)$ is not empty and give a proof by induction on $\tau$ .

A.3.1 Base case ( $\tau=1$ )

Let

[TABLE]

be the probability that node $v$ is activated in step $1$ provided that the set of nodes $T$ was active at step [math]. From independence of the model, the events of activating different nodes $v\in N(T)$ at step 1 are independent. We have that the set of nodes that is active at step 1 is a random subset $S$ of $N(T)$ with probabilities $\{p_{v}\}$ as defined in Subsection A.1. Moreover, from monotonicity and submodularity of $H$ , the function $f(S):=H(T\cup S)-H(T)$ is monotone and submodular with $f(\emptyset)=0$ . We can therefore apply Theorem A.1 to bound the variance of $f(S)$ :

[TABLE]

We now note that

[TABLE]

and

[TABLE]

For all $v\in N(T)$ we have $f(v)=H(T\cup\{v\})-H(T)\leq H(v)=\texttt{I}^{0}(v)$ . Therefore

[TABLE]

Substituting in (14) we obtain the claim

[TABLE]

A.3.2 Inductive step

We define $\texttt{Reach}^{t}_{\mathcal{G}}(T\mid A)$ to be the random variable that is the $t$ -steps reachability of $T$ in a diffusion on $\mathcal{G}$ seeded with $T$ that is conditioned on the event that exactly the nodes $A\subset N(T)$ (and no other nodes) are activated in step 1. Equivalently, we condition on $\boldsymbol{\phi}$ such that for $v\setminus(T\cup A)$ , $\phi_{v}(T)=0$ and for $v\in A$ , $\phi_{v}(T)=1$ . We respectively define $\texttt{R}^{t}_{\mathcal{G}}(T\mid A)$ to be the random variable $H(\texttt{Reach}^{t}_{\mathcal{G}}(T\mid A))$ . From definition, we have

[TABLE]

We consider the reduced model $\mathcal{G}^{\prime}$ of $\mathcal{G}$ with respect to $T$ and show that the conditioned $t\geq 1$ steps diffusion from $T$ in $\mathcal{G}$ is equivalent to the unconditioned $t-1$ steps diffusion from $A$ in $\mathcal{G^{\prime}}$ :

Lemma A.6.

For any $A\subset N(T)$ and $t\geq 1$ , the random variables $\texttt{Reach}^{t-1}_{\mathcal{G}^{\prime}}(A)$ and $\texttt{Reach}^{t}_{\mathcal{G}}(T|A)\setminus\{T\}$ have identical distribution overs subsets. The random variables $\texttt{R}^{t-1}_{\mathcal{G}^{\prime}}(A)$ and $\texttt{R}^{t}_{\mathcal{G}}(T|A)-H(T)$ have identical distributions over values.

Proof.

We first consider $t=1$ . For a draw of conditioned activation functions we have $\texttt{Reach}^{1}_{\mathcal{G}}(T|A)=T\cup A$ . By definition, we also have $\texttt{Reach}^{0}_{\mathcal{G}^{\prime}}(A)=A$ and the claim holds.

We next consider $t>1$ . We first observe that in both situations, (i) the reduced model $\mathcal{G}^{\prime}$ when seeded with $A$ and (ii) the conditioned diffusion in $\mathcal{G}$ seeded with $T$ such that the nodes $A$ are activated in the first step, the progression is determined only by the activation functions on the nodes $V\setminus(T\cup A)$ .

We next argue that the distribution of activation functions projected on the nodes $V\setminus(T\cup A)$ is the same in both situations. From independence of $\mathcal{G}$ it suffices to consider separately the activation functions of each node. From definition of a reduced model, we draw for each $v\in V\setminus T$ , $\phi_{v}\sim\mathcal{G}$ conditioned on $\phi_{v}(T)=0$ . This is exactly what we get for the conditioned diffusion in $\mathcal{G}$ .

We can thus match the supports (sets of activations functions) in both situations so that $\boldsymbol{\phi}$ and $\boldsymbol{\phi}^{\prime}$ are matched when the projections on $V\setminus(T\cup A)$ is the same. The starting points are at steps [math] of the reduced model and step $1$ of the conditioned process is $A$ , the progression of new activations is thus the same. Therefore, for any step $t\geq 1$ ,

[TABLE]

and the first claim follows.

For the second claim, note that $\texttt{R}^{t}_{\mathcal{G}}(T|A)=H(\texttt{Reach}^{t}_{\mathcal{G}}(T|A))$ and thus

[TABLE]

where the equalities are those of distributions. ∎

As immediate corollaries we can relate expectations and variance of as follows:

[TABLE]

Total Variance:

We define the random variable $A$ to be the subset of $N(T)$ which is activated after the first step. Note that $A$ is a random subset of $N(T)$ using probabilities $p_{v}$ for $v\in N(T)$ as defined in Section A.1. By the total variance formula we get that

[TABLE]

We bound the total variance by separately bounding the two terms.

Bound on the first term of the total variance:

We consider the reduced model $\mathcal{G}^{\prime}$ with respect to $T$ and a restriction of the influence function $\texttt{I}^{\tau-1}_{\mathcal{G}^{\prime}}$ to the domain that is subsets $A\subseteq N(T)$ :

[TABLE]

From Lemma A.6, this function represents the expected marginal utility value of nodes which are not in $T$ that are activated after $\tau$ steps if we activate $T$ at step [math] and the set $A$ at step $1$ .

We first observe that $f$ is monotone and submodular. This because strong monotone submodularity of our model implies that the reduced model is also strongly monotone and submodular, and a restriction of a monotone and submodular function is also monotone and submodular. We establish two helpful properties of $f$ . First,

[TABLE]

which holds for any influence function. Second, using Lemma A.4 we obtain

[TABLE]

We are now ready to bound the first term of the total variance (18). Our monotone submodular function $f$ and the random subset $A$ using probabilities $p_{v}$ satisfy the conditions of Theorem A.1.

[TABLE]

Bound on the second term of the total variance:

We next bound the second term of (18) which is the expectation of the variance conditioned on $A$ :

[TABLE]

Where we take

[TABLE]

to be the subset that maximizes the ratio.

Using the induction hypothesis on $(\tau-1)$ -stepped influence we get

[TABLE]

We now relate the maximum influence of nodes in the original and reduced models:

[TABLE]

From (22) using (23) and (24) we obtain

[TABLE]

Combining the bounds of the first and second terms

The claim of the Theorem follows using total variance (18) and the bounds on the first term (21) and second term (25).

Appendix B Variance lower bound construction

Lemma B.1.

Let $\mathcal{G}$ be complete binary tree where each edge has probability $\frac{1}{2}$ and let $h(u)$ be the height of the node $u$ . Then, $\texttt{I}^{\tau}(u)=h(u)$ and $\mathop{\sf Var}[\texttt{R}^{\tau}(u)]=\frac{1}{2}\sum_{i=0}^{h(u)-1}i^{2}=\frac{\left(h(u)-1\right)h(u)\left(2h(u)-1\right)}{12}$ .

Proof.

By induction on the height of the of the node.

base step: ( $h(u)=1$ ): It is clear that $\texttt{I}^{1}(u)=1$ and $\mathop{\sf Var}\left[\texttt{R}^{1}(u)\right]=0$ since $u$ is a leaf.

Inductive step: $u$ has two neighbors and each is reached with probability $\frac{1}{2}$ . Let $v_{1}$ and $v_{2}$ be the neighbors of $u$ and let $X_{1},X_{2}$ be random variables that indicate if $(u,v_{1}),(u,v_{2})$ were activated respectively. The variables $X_{1},X_{2}$ are Bernoulli random variables with $p=\frac{1}{2}$ , hence, $\textsc{E}[X_{1}]=\textsc{E}[X_{2}]=\frac{1}{2}$ and $\mathop{\sf Var}[X_{1}]=\mathop{\sf Var}[X_{2}]=\frac{1}{4}$ . Since the graph is a tree, the reachabilities of $v_{1}$ and $v_{2}$ are independent random variables, so we can simply write:

[TABLE]

The variable $\texttt{R}^{\tau-1}(v_{1})$ and $\texttt{R}^{\tau-1}(v_{2})$ are identical and $X_{1},\texttt{R}^{\tau-1}(v_{1})$ and $X_{2},\texttt{R}^{\tau-1}(v_{2})$ are independent random variables, Thus,

[TABLE]

The computation of the variance is similar:

[TABLE]

For two independent random variables $A,B$ holds that: $\mathop{\sf Var}[AB]=\mathop{\sf Var}[A]\mathop{\sf Var}[B]+\mathop{\sf Var}[A]\textsc{E}^{2}[A]+\mathop{\sf Var}[B]\textsc{E}^{2}[B]$ , we have that:

[TABLE]

∎

Theorem B.1.

There is a model $\mathcal{G}$ and a set of nodes $T$ such that $\frac{\mathop{\sf Var}[\texttt{R}^{\tau}(T)]}{\texttt{M}^{\tau}(\bar{T})\texttt{I}^{\tau}(T)}\geq\frac{\tau}{12}$ .

Proof.

Lemma B.1 shows that for every node $u\in\mathcal{G}$ , $\texttt{I}^{\tau}(u)=h(u)$ and $\mathop{\sf Var}[\texttt{R}^{\tau}(u)]=\frac{\left(h(u)-1\right)h(u)\left(2h(u)-1\right)}{12}$ . It follows that the root $r$ has the largest influence $\texttt{I}^{\tau}(r)=\tau$ and $\mathop{\sf Var}[\texttt{R}^{\tau}(u)]=\frac{(\tau-1)t(2\tau-1)}{12}$ , Furthermore $\texttt{M}^{\tau}(\bar{r})=\tau-1$ since the nodes of the largest influence in $V\setminus r$ are the children of $r$ . We conclude that:

[TABLE]

∎

Appendix C Greedy optimization with approximate non-submodular oracle

In this section we present the proof of Lemma 8.1. We show that our approximation guarantees imply that the application of greedy on $\hat{F}$ generates a sequence that is an approximate greedy sequence (in the sense of Lemma C.1) with respect to $F$ .

We first state a helpful Lemma [13] that establishes that it suffices that $\hat{F}(u\mid S)$ to approximate the marginal contributions

[TABLE]

.

Lemma C.1.

[13]** Given a monotone submodular function $F$ , an approximate greedy algorithm that for some $\epsilon\in[0,1)$ selects at each step an element $u$ such that $F(u\mid S)\geq(1-\epsilon)\max_{v}F(v\mid S)$ has approximation ratio $\geq(1-(1-1/s)^{s})(1-\epsilon)$ .

Proof.

It is easy to see that the approximation ratio of $\epsilon$ -approximate greedy is $1-(1-(1-\epsilon)/s)^{s}$ . It therefore suffices to establish that this expression is larger than $(1-(1-1/s)^{s})(1-\epsilon)$ for $\epsilon\in[0,1].$ Equivalently, we need to show that for all $s\geq 2$ and $x\in[0,1]$

[TABLE]

This follows from equality holding for $x=0$ and $x=1$ and the function being concave up (second derivative is positive). ∎

Proof of Lemma 8.1.

Consider a monotone non-negative $\hat{F}$ that is a uniform $\epsilon_{A}$ -approximation of a monotone non-negative $F$ with $\epsilon_{A}=\frac{\epsilon(1-\epsilon)}{14s}$ . By definition of $\epsilon$ -approximation (see Section 3.2), $\left|\hat{F}(T)-F(T)\right|\geq\epsilon_{A}\max\{F(T),\textsc{OPT}_{1}(F)\}$ for all $S$ with $|S|\leq s$ . Therefore,

[TABLE]

Inequality (26) follows immediately when $F(S)\geq\textsc{OPT}_{1}(F)$ because the relative error is at most $\epsilon_{A}\leq\frac{\epsilon}{14s}$ . For $(1-\epsilon)\textsc{OPT}_{1}(F)\leq F(S)<\textsc{OPT}_{1}(F)$ we have absolute error being at most $\epsilon_{A}\textsc{OPT}_{1}(S)$ which is a relative error of at most $\epsilon_{A}/(1-\epsilon)\leq\frac{\epsilon}{14s}$ . Inequality (27) follows from the absolute error being at most $\epsilon_{A}\textsc{OPT}_{1}(F)$ and $\epsilon_{A}\leq\epsilon/2$ .

We establish that these conditions imply that greedy on $\hat{F}$ on the prefix of the greedy sequence where $F(S)\leq\frac{3}{4}\textsc{OPT}_{s}(F)$ is actually approximate greedy (as in the conditions of Lemma C.1) with respect to $F$ . Note that $1-(1-1/s)^{s}\geq 3/4$ for $s\geq 2$ and thus the prefix restriction does not limit generality. The claim will then follow from Lemma C.1.

For $s=1$ , it follows from Equations (26) and (27) , that the first element of a greedy sequence with respect to $\hat{F}$ , $\arg\max_{u}\hat{F}(u)$ , satisfies $\hat{F}(u)\geq(1-\frac{\epsilon}{14})\textsc{OPT}_{1}(F)$ . Therefore from the second iteration and on, we have a set $S$ for which the relative error bounds in Equation (26)) applies.

We consider the marginal contributions $F(u\mid S)$ for any node $u$ . We have

[TABLE]

We use these inequalities to bound the absolute error of (any) marginal influence estimate by

[TABLE]

We now consider the node $v=\arg\max_{u\in V}F(u\mid S)$ with maximum marginal contribution to $S$ with respect to $F$ and its contribution value

[TABLE]

Thus, when $F(S)\leq\frac{3}{4}\textsc{OPT}_{s}$ ,

[TABLE]

By applying (C) to $v$ we get that

[TABLE]

Therefore the node $v^{\prime}=\arg\max_{v}\hat{F}(v^{\prime}\mid S)$ with maximum marginal contribution according to $\hat{F}$ satisfies

[TABLE]

By using (C) again, substituting (29), and using that fact that $s\geq 2$ :

[TABLE]

Therefore, the greedy sequence according to $\hat{F}$ is an approximate greedy sequence according to $F$ and satisfies the conditions of Lemma C.1. Therefore the resulting sequence yields an approximation ratio at least $(1-(1-1/s)^{s})(1-\epsilon)$ . ∎

Appendix D Greedy for live-edge models

Proof of Theorem 8.2:

Proof.

For the first bound, we explicitly maintain for each node $u\in V$ , for each pool, the reachability set of $u$ in the simulations of the pool (and its cardinality). The dominant term in the cost of computation is performing a BFS from each node in each of the $r\ell$ simulations that is truncated at distance $\tau$ . The total computation time is

[TABLE]

where

[TABLE]

is the average number of edges per simulation. For an IC model, $\textsc{E}[\overline{m}]=\sum_{e\in\mathcal{E}}p_{e}$ . When a node $u$ is selected into the seed set we remove all nodes in its reachability set from the reachability sets of all other nodes. The removal cost can be "charged" to the initial reachability computation.

The dependence of the computation time on the graph size can be improved by using combined reachability sketches [9, 13, 14, 10] instead of maintaining the reachability sets explicitly (see Section 5.1). The sketch size needed in order to provide the required accuracy of $O(\epsilon/s)$ (as in Theorem 8.1) uniformly for all subsets of size at most $s$ is $k=O(\epsilon^{-2}s^{3}\ln{n})$ . We compute a sketch for each node in each of the $r$ pools, so in total we have $rn$ node sketches. The construction time of these sketches has a term $\sum_{ij}|E_{ij}|=r\ell\overline{m}$ linear in the total size of simulations and a term for sketch constructions which is a product of the number of pools $r$ and the construction time for each pool. The per-pool construction time is as described in Section 5.1 and is bounded by $k$ (sketch size) visits for each node, each involving reverse traversals of incoming edges of the node in some simulation. The per-pool construction time for an IC model is $O(k(n+\sum_{e}p_{e}))$ in expectation. The time with arbitrary simulations for pool $i$ is $O(k(n+\sum_{v}\max_{j}d_{v}(E_{ij})))$ . In total over all pools, the construction time is dominated by $O(r(\ell\overline{m}+k(n+m^{*}))$ , where $m^{*}=\sum_{v}\max_{ij}d_{v}(E_{ij})$ for arbitrary simulations and $m^{*}=\sum_{e}p_{e}$ for simulations generated by an IC model.

The sketches improve the computation time of greedy. Each iteration of greedy uses the (precomputed) union sketch of the current seed set $S$ in each pool. It then examines the sketches of each $v\in V$ to compute the estimate of the averaging oracle $\hat{{\sf A}}(S\cup\{v\})$ in each pool. This operations takes $O(knr)$ in total for the iteration. Therefore $s$ iterations of greedy maximization takes $O(knrs)$ using the sketches. Combining the construction cost of the sketches using $r=O(s\ln{\frac{n}{\delta}})$ and $\ell=O(\epsilon^{-2}s^{2}c)$ and the greedy implementation over the sketches we obtain a total bound on the computation time of

[TABLE]

∎

Appendix E Optimization with adaptive sample size

The pseudocode for our wrapper is provided in Algorithm 1. The inputs to the wrapper are a base algorithm ${\mathcal{A}}$ and two constructions of oracles from sets of simulations. The first construction produces an oracle, $\hat{F_{v}}$ , that we use for validation. The second construction produces oracles, $\hat{F}_{x}$ , that are provided as input to ${\mathcal{A}}$ to perform the optimization. The oracles provide an approximation of our influence function $\texttt{I}^{\tau}(S)$ with non-uniform guarantees. For specified $(\epsilon,\delta)$ we use the expressions $r_{v}(\epsilon,\delta)$ or $r_{x}(\epsilon,\delta)$ for the number of simulations required to obtain $(\epsilon,\delta)$ guarantees (in the sense of Section 3.2). This gives us a relation between $\epsilon$ , $\delta$ , and a number of simulations. When constructing an oracle with a given number of simulations $r$ and a specified $\epsilon$ , we can determine the confidence $\delta$ we have from $\epsilon$ and $r$ . The oracles that we consider have the property that for a fixed $\epsilon$ , $\delta$ decreases at least linearly with the number of simulations. (i.e., when we double the number of simulations $\delta$ decreases by at least a factor of $2$ .)

The wrapper first determines an upper bound ( $\lceil\log_{2}M/r_{x}(\epsilon,\delta)\rceil$ ) on the maximum number of iterations it performs (based on the initial number and the simulation budget) and constructs a validation oracle that provides guarantees for a small number of sets (queries) which equals this maximum number of iterations. It then starts with a set $\mathcal{R}$ of $r_{x}(\epsilon,\delta)$ simulations that suffice for the oracle $\hat{F}_{x}$ to provide (non-uniform) $(\epsilon,\delta)$ approximation guarantees. The wrapper repeats the following: It constructs an “optimization” oracle $\hat{F}_{x}$ using the set of simulations $\mathcal{R}$ and applies ${\mathcal{A}}$ over $\hat{F}_{x}$ to obtain a set $T$ . The wrapper terminates when $\hat{F}_{v}(T)$ is close to $\hat{F}_{x}(T)$ or when our simulation budget of $M$ is exceeded. Otherwise, we double the number of simulations in our set $\mathcal{R}$ and repeat.

The wrapped algorithm ${\mathcal{A}}$ can be an exact or approximate optimizer. It is applied to the oracle function and therefore its quality guarantees are with respect to how well the oracle value $\hat{F}_{x}(T)$ of the output set $T$ approximates the oracle optimum $\max_{S\mid|S|\leq s}\hat{F}_{x}(S)$ . The wrapper extends the approximation guarantees that ${\mathcal{A}}$ provides (with respect to the oracle) to a guarantee with respect to the influence function while avoiding the worst-case number of simulations needed for a uniform approximation.

We first establish some basic properties.

Lemma E.1.

Let $S$ be a set with maximum influence (with $\texttt{I}^{\tau}(S)=\textsc{OPT}^{\tau}_{s}$ ). With probability at least $1-2\delta$ , all the optimization oracles $\hat{F_{x}}$ constructed by the wrapper have $(1-\epsilon)\textsc{OPT}^{\tau}_{s}\leq\hat{F_{x}}(S)\leq(1+\epsilon)\textsc{OPT}^{\tau}_{s}$ .

Proof.

The probability that $(1-\epsilon)\textsc{OPT}^{\tau}_{s}\leq\hat{F_{x}}(S)\leq(1+\epsilon)\textsc{OPT}^{\tau}_{s}$ fails for the first oracle is at most $\delta$ . The number of $\hat{F}_{x}$ uses simulations doubles in each iteration and all our constructions are such that the confidence parameter $\delta$ decreases at least linearly with the number of simulations. We therefore obtain that the sequence of failure probabilities for $(1-\epsilon)\textsc{OPT}^{\tau}_{s}\leq\hat{F_{x}}(S)\leq(1+\epsilon)\textsc{OPT}^{\tau}_{s}$ is geometric and sums up to at most $2\delta$ . ∎

As an immediate corollary we obtain:

Corollary E.2.

Under the conditions of Lemma E.1, the oracle optimum in all iterations satisfies

[TABLE]

The following is immediate from the construction of the validation oracle.

Lemma E.3.

With probability at least $1-\delta$ , the validation oracle has relative error at most $\epsilon$ on all tests in which the input set $T$ is such that $\texttt{I}^{\tau}(T)\geq\textsc{OPT}^{\tau}_{1}$ and absolute error at most $\epsilon\textsc{OPT}^{\tau}_{1}$ otherwise.

Proof.

The wrapper performs at most $\lceil\log_{2}M/r\rceil$ iterations before it stops, in each iteration the validation oracle fails to provide an $\epsilon$ -approximation with probability at most $\delta_{v}$ . Therefore, by union bound, the probability that the algorithm fails to provide an $\epsilon$ -approximation in at least one round is at most $\delta_{v}\lceil\log_{2}M/r\rceil\leq\delta$ . ∎

Lemma E.4.

Assume that our data and our optimization oracle with $r$ or more simulations, are such that with probability at least $1-\delta$ , the optimum of the oracle is an approximate optimizer, that is:

[TABLE]

and assume that the algorithm ${\cal A}$ returns the oracle optimum. Then with probability at least $1-5\delta$ , the wrapper terminates after at most $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ simulations and returns $T$ such that $\texttt{I}^{\tau}(T)\geq(1-5\epsilon)\textsc{OPT}_{s}^{\tau}$ .

Proof.

First we show that the wrapper returns a set $T$ with the required properties with probability at most $1-3\delta$ and then we show that the number of iterations the wrapper does before it stops is smaller than $M$ with probability of at most $1-2\delta$ .

From Lemma E.1, with probability at least $1-2\delta$ in all iterations ${\mathcal{A}}$ returns $T$ for which $\hat{F_{x}}(T)\geq(1-\epsilon)\textsc{OPT}^{\tau}_{s}$ . The validation succeeds only if $\hat{F_{v}}(T)\geq\frac{(1-2\epsilon)}{1+\epsilon}\hat{F_{x}}(T)\geq\frac{(1-\epsilon)(1-2\epsilon)}{1+\epsilon}\textsc{OPT}^{\tau}_{s}$ . From Lemma E.3 with probability at least $1-\delta$ in all iterations we have

[TABLE]

Therefore, with probability $1-3\delta$ the set $T$ returned by the wrapper satisfies

[TABLE]

If $(1+\epsilon)\texttt{I}^{\tau}(T)>\texttt{I}^{\tau}(T)+\epsilon\textsc{OPT}^{\tau}_{1}$ then $\texttt{I}^{\tau}(T)\geq\frac{(1-\epsilon)(1-2\epsilon)}{(1+\epsilon)^{2}}\textsc{OPT}^{\tau}_{s}\geq(1-5\epsilon)\textsc{OPT}^{\tau}_{s}$ . Otherwise, we have that $\texttt{I}^{\tau}(T)\geq\left(\frac{(1-\epsilon)(1-2\epsilon)}{1+\epsilon}-\epsilon\right)\textsc{OPT}^{\tau}_{s}\geq(1-5\epsilon)\textsc{OPT}^{\tau}_{s}.$

We have to show that with probability at least $1-2\delta$ within $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ simulations the wrapper returns such a set $T$ to finish the proof. Consider the first iteration where $|{\mathcal{R}}|\geq r$ . By Equations (33) and (34) with probability at least $1-\delta$ we have that $\texttt{I}^{\tau}(T)\geq(1-\epsilon)\textsc{OPT}_{s}^{\tau}$ and $(1+\epsilon)\textsc{OPT}_{s}^{\tau}\geq\hat{F_{x}}(T)\geq(1-\epsilon)\textsc{OPT}_{s}^{\tau}$ .By Lemma E.3 we have that with probability at least $1-\delta$ , the validation oracle satisfies that $\hat{F_{v}}(T)\geq(1-\epsilon)\texttt{I}^{\tau}(T)$ or $\hat{F_{v}}(T)\geq\texttt{I}^{\tau}(T)-\epsilon\textsc{OPT}_{s}^{1}\geq\texttt{I}^{\tau}(T)-\epsilon\textsc{OPT}_{s}^{\tau}$ . By the last two statements we have that with probability of at least $1-2\delta$ :

[TABLE]

or

[TABLE]

∎

Theorem 7.1, which we restate below to provide reading fluency, now follows as a corollary.

Theorem E.1 (Theorem 7.1).

Suppose that on our data the averaging (respectively, median-of-averages) oracle $\hat{F}$ has the property that with $r$ simulations, with probability at least $1-\delta$ , the oracle optimum $T:=\arg\max_{S\mid|S|\leq s}\hat{F}(S)$ satisfies

[TABLE]

Then with probability at least $1-5\delta$ , when using $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln s\right)\right)\right)$ simulations with the median-of-averages oracle and $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln n\right)\right)\right)$ simulations with the averaging oracle, the wrapper outputs a set $T$ such that $\texttt{I}^{\tau}(T)\geq(1-5\epsilon)\textsc{OPT}_{s}^{\tau}$ .

Proof of Theorem 7.1.

We analyze here the number of simulations required using the averaging oracles and the median-of-averages oracles, in both cases we use median-of-averages oracles for validation. In both cases $r_{v}=r(\epsilon,\delta_{v})=O\left(\epsilon^{-2}c\log{\frac{1}{\delta_{v}}}\right)$ , where $\delta_{v}=\frac{\delta}{\lceil\log_{2}{\frac{M}{r_{x}}}\rceil}$ . By Lemma E.4 the number of simulations is at most $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ . $M$ and $r_{x}$ get different values for each oracle.

median-of-averages oracles analysis

We have that $r_{x}=O(\epsilon^{-2}c\ln{\delta^{-1})}$ by Lemma 6.1 and we set $M=O(\epsilon^{-2}cs\ln{\frac{n}{\delta}})$ by Theorem 6.1. Simple calculation shows that:

[TABLE]

Therefore,

[TABLE]

averaging oracles analysis

We have $r_{x}=O(\epsilon^{-2}c\delta^{-1})$ and we set $M=O\left(\epsilon^{-2}sn\ln{\frac{n}{\delta}}\right)$ according to the respective worst-case guarantees on the number of simulations specified in (1). A simple calculations shows:

[TABLE]

Therefore,

[TABLE]

∎

We next consider cases where the algorithm ${\mathcal{A}}$ is approximate (may not return the oracle optimizer). We assume in these cases that the optimization oracles $\hat{F}_{x}$ when constructed with a given number of simulations provide, with high probability, uniform $\epsilon$ -approximation for all $\binom{n}{s}$ subsets of cardinality at most $s$ :

[TABLE]

We first show that a very weak assumption on $\mathcal{A}$ suffices to guarantee termination with good probability.

Lemma E.5.

If the optimization oracle $\hat{F_{x}}$ when constructed with $r$ or more simulations provides uniform $\epsilon$ -approximation with probability at least $1-\delta$ , and the algorithm ${\cal A}$ returns $T$ such that $\hat{F_{x}}(T)\geq(1-\epsilon)\textsc{OPT}_{1}^{\tau}$ . Then with probability at least $1-2\delta$ the wrapper will terminate after using at most $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ simulations.

Proof.

Consider the first iteration where $\hat{F_{x}}$ is constructed using at least $r$ simulations. Let $T$ be the set that ${\mathcal{A}}$ returns at this iteration. Since $\hat{F}_{x}$ provides uniform $\epsilon$ -approximation we have that $\hat{F_{x}}(T)\leq(1+\epsilon)\texttt{I}^{\tau}(T)$ with probability at least $1-\delta$ . Combining this with our assumption we get that $\texttt{I}^{\tau}(T)\geq\textsc{OPT}^{\tau}_{1}$ and by Lemma E.3 we have that with probability at least $1-\delta$ if $\texttt{I}^{\tau}(T)\geq\textsc{OPT}_{1}^{\tau}$ then $\hat{F}(T)\geq(1-\epsilon)\texttt{I}^{\tau}(T)$ and if $\frac{1-\epsilon}{1+\epsilon}\textsc{OPT}^{\tau}_{1}\leq\texttt{I}^{\tau}(T)\leq\textsc{OPT}^{\tau}_{1}$ then $\hat{F}_{v}(T)\geq\texttt{I}^{\tau}(T)-\epsilon\textsc{OPT}^{\tau}_{1}\geq\texttt{I}^{\tau}(T)-\frac{\epsilon(1+\epsilon)}{1-\epsilon}\texttt{I}^{\tau}(T)$ . Combining we obtain that $\hat{F}_{v}(T)\geq\frac{1-2\epsilon}{1+\epsilon}\hat{F}_{x}(T)$ , and thus the validation condition holds. ∎

We next consider algorithms ${\cal A}$ that guarantees some approximation ratio $\rho$ .

Theorem E.2.

Suppose that our optimization oracle when constructed with $r$ or more simulations provides uniform $\epsilon$ -approximation with probability at least $1-\delta$ . Assume now that the algorithm ${\cal A}$ returns a set $T$ such that

[TABLE]

Then, the set $T$ returned by our wrapper satisfies $\texttt{I}^{\tau}(T)\geq\rho(1-5\epsilon)\textsc{OPT}^{\tau}_{s}$ with probability of at least $(1-3\delta)$ .

Proof.

Consider an optimal set $S$ (with $\texttt{I}^{\tau}(S)=\textsc{OPT}^{\tau}_{s}$ ). By Lemma E.1 with probability at least $1-2\delta$ all our oracles have $(1-\epsilon)\textsc{OPT}^{\tau}_{s}\leq\hat{F_{x}}(S)\leq(1+\epsilon)\textsc{OPT}^{\tau}_{s}$ are within $(1\pm\epsilon)\textsc{OPT}^{\tau}_{s}$ . By the assumption, the sets $T$ returned by ${\cal A}$ in all iterations have $\hat{F_{x}}(T)\geq\rho\max_{S\mid|S|\leq s}\hat{F_{x}}(S)\geq\rho(1-\epsilon)\textsc{OPT}^{\tau}_{s}$ . When the wrapper stops we have that $\hat{F_{x}}(T)\leq\frac{1+\epsilon}{1-2\epsilon}\hat{F_{v}}(T)$ and by Lemma E.3 we have with probability at least $1-\delta$ that $\hat{F_{v}}(T)\leq\max\{(1+\epsilon)\texttt{I}^{\tau}(T),\texttt{I}^{\tau}(T)+\epsilon\textsc{OPT}^{\tau}_{1}\}$ .

Combining, we have that with probability at least $1-3\delta$ ,

[TABLE]

Now, a simple calculation shows that $\texttt{I}^{\tau}(T)\geq\rho(1-5\epsilon)\textsc{OPT}^{\tau}_{s}$ .

∎

We can prove now Theorem 7.2 (restated for reading fluency):

Theorem E.3 (Theorem 7.2).

If the averaging oracle $\hat{{\sf A}}$ has the property that with $\geq r$ simulations, with probability at least $1-\delta$ , it provides a uniform $\epsilon$ -approximation for all subsets of size at most $s$ , then with $2\max\{r,r(\epsilon,\delta)\}+O\left(\epsilon^{-2}c\left(\ln{\frac{1}{\delta}}+\ln\left(\ln\ln\frac{n}{\delta}+\ln n\right)\right)\right)$ simulations we can find in polynomial time a $(1-(1-1/s)^{s})(1-5\epsilon)$ approximate solution with confidence $1-5\delta$ .

Proof.

The averaging oracle is monotone and submodular [29] and therefore greedy can efficiently recover a set $T$ such that $\hat{F_{x}}(T)\geq(1-(1-1/s)^{s})\max_{S\mid|S|\leq S}\hat{F_{x}}(S)$ .

By Lemma E.5, the wrapper terminates using at most $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ with probably at least $1-2\delta$ . Applying Theorem E.2 with $\rho=(1-(1-1/s)^{s})$ , we get that $\texttt{I}^{\tau}(T)\geq(1-(1-1/s)^{s})(1-5\epsilon)\textsc{OPT}^{\tau}_{s}$ with probability at least $1-3\delta$ . Hence, with probability at least $1-5\delta$ the wrapper applied with greedy finds $(1-(1-1/s)^{s})(1-5\epsilon)$ -approximate solution using $2\max\{r,r_{x}(\epsilon,\delta)\}+r_{v}$ simulations. ∎

Appendix F Variance bounds for dependent models

In this section we provide a proof for Corollary 4.1. We consider a natural extensions of IC models, $b$ -dependence, that allow for some dependencies between edges and mixtures of IC and IGT models. For these extensions, we establish upper bounds of the form (8) on the variance of the reachability of a set of nodes.

We bound the variance by constructing for each dependent model a corresponding IC model and then apply the variance upper bound established in Section A for IC models.

For mixture models we provide a generic derivation that bounds the variance of the mixture by variance of components.

F.1 $b$ -dependence models

The first family we consider are $b$ -dependence models, which we define as follows. We assume that all edges with the same tail node are partitioned into disjoint groups where each group is of size at most $b$ . The edges of each group $B$ are either all active together with probability $p_{B}$ or none is active with probability $1-p_{B}$ . The special case where all groups are of size $1$ corresponds to an IC model (where all edges are independent).

Theorem F.1.

Let $\mathcal{G}$ be a $b$ -dependence model for some $b\geq 1$ . For every set $T$ we have that:

[TABLE]

Proof.

We construct an IC model $\mathcal{G}^{\prime}$ from the given $b$ -dependence model $\mathcal{G}$ . The model $\mathcal{G}^{\prime}$ is defined over the set of nodes $V$ of $\mathcal{G}$ together with an additional set $D$ of dummy nodes. The construction has the properties that $2\tau$ -step influence in $\mathcal{G}^{\prime}$ from a set of nodes $T\subseteq\ V$ is equal to $\tau$ -stepped influence of $T$ in $\mathcal{G}$ . Furthermore, the variances of the sizes of the $2\tau$ -step reachbility of $T$ in $\mathcal{G}^{\prime}$ is the same as the variance of the $\tau$ step reachability of $T$ in $\mathcal{G}$ . The influence of each dummy node in $\mathcal{G^{\prime}}$ is at most $b\max_{v\in V}\texttt{I}^{\tau}(v)$ . The claim follows from these properties and Theorem 4.1.

Here is a formal description of our reduction.

We start by putting in $\mathcal{G}^{\prime}$ the set $V$ of the nodes of $\mathcal{G}$ . Then for every group $B=\{(u,v_{1}),(u,v_{2}),...,(u,v_{\ell})\}$ in $\mathcal{G}$ we do the following:

Add a new dummy node $v_{B}$ to $\mathcal{G}^{\prime}$ , and add to $\mathcal{G}^{\prime}$ the edge $(u,v_{B})$ and give it the probability $p_{B}$ . We assign weight [math] to $v_{B}$ so that it does not contribute to the reachability of any set of nodes. 2. 2.

we create edges $(v_{B},v_{i})$ for every $1\leq i\leq\ell$ , each such edge has probability 1.

Let $T\subset V$ be a set of nodes in $\mathcal{G}$ . It follows from our construction that for any set of nodes $B\subset V$ the probability that $\texttt{R}^{2\tau}_{\mathcal{G}^{\prime}}(T)=B$ is the same as the probability that $\texttt{R}^{\tau}_{\mathcal{G}}(T)=B$ . This implies that for any $T\subseteq V$

[TABLE]

and

[TABLE]

Each dummy node is connected to at most $k$ original nodes, hence, $\texttt{I}^{2\tau}_{\mathcal{G}^{\prime}}(v)$ is bounded by $b\max_{v\in V}\texttt{I}^{2\tau}_{\mathcal{G}^{\prime}}(v)$ . By Theorem A.3 it follows that for every set of nodes $T$ in $\mathcal{G^{\prime}}$ :

[TABLE]

Combining all these observations together, we get that

[TABLE]

∎

This Theorem can be generalized to more complex dependencies. For example it holds for any distribution on subsets of the outgoing edges from each node that we can realize by a distribution on disjoint subsets where we draw each subset with certain probability, and take the union of the subset which we draw.

F.2 Mixture of IC and IGT models

The second family of dependent models we consider is a mixture of IC and IGT models.

Consider a set of models $\mathcal{G}_{i}(V)$ for $i\in[r]$ and respective probabilities $p_{i}$ such that $\sum_{i=1}^{r}p_{i}=1$ . We define a mixture model $\mathcal{G}(V)$ as follows. To draw $\boldsymbol{\phi}\sim\mathcal{G}$ , we first draw $i\in[r]$ according to probabilities $p_{i}$ and then return $\boldsymbol{\phi}\sim\mathcal{G}_{i}$ .

We provide two proofs for the variance bound of the mixture. The first is direct and applies to any mixture of models that satisfies the variance bound of Theorem 4.1), and in particular to mixtures of strongly submodular SDMs. The second proof is specific to live-edge models and based on a reduction to an IC model.

Theorem F.2.

Consider a model $\mathcal{G}$ that is a mixture of $r$ models $\mathcal{G}_{i}$ with probabilities $p_{i}$ that satisfy the variance bound of Theorem 4.1. Then for all $T\subset V$ ,

[TABLE]

Proof.

We first relate the influence of $T$ in the mixture model to the influence of $T$ in the components.

[TABLE]

This holds to any set $T$ and any $\tau$ . Therefore we also obtain the inequality

[TABLE]

It also follows that we can bound the influence values on the component by the respective ones in the mixture: $\texttt{I}^{\tau}_{\mathcal{G}_{i}}(T)\leq\frac{1}{p_{i}}\texttt{I}^{\tau}_{\mathcal{G}}(T)$ and thus

[TABLE]

The random variable $\texttt{R}^{\tau}_{\mathcal{G}}(T)$ can be expressed as a sum of of $r$ products of random variables:

[TABLE]

where $X_{i}$ are Bernoulli with probabilities $p_{i}$ . The random variables $\{\texttt{R}^{\tau}_{\mathcal{G}_{i}}(T)\}$ are independent of each other and also are independent from (the joint distribution of) $\{X_{i}\}$ . The variables $\{X_{i}\}$ have negative dependence as $\sum_{i}X_{i}=1$ and thus the products $X_{i}\texttt{R}^{\tau}_{\mathcal{G}_{i}}(T)$ are also negatively dependent and thus

[TABLE]

We will instead bound the variance of a surrogate random variable

[TABLE]

that has the same sum of products but with the variables $\{X_{i}\}$ being independent of each other and hence the products are also independent. We have

[TABLE]

We next express the variance of each product using variance properties of the product of two independent random variables. For $i\in[r]$ :

[TABLE]

Therefore, invoking Theorem 4.1 to bound the variance for each IC model $\mathcal{G}_{i}$ and then using (37) and (38) and finally using (40) and (44) we get

[TABLE]

∎

We next give a different proof (of a slightly different bound) for live-edge models using a reduction to an IC model. Consider a set of $\tau$ -steps models $\mathcal{G}_{i}(V,\mathcal{E}_{i})$ for $i\in[r]$ and respective probabilities $p_{i}$ such that $\sum_{i=1}^{r}p_{i}=1$ . We define a mixture model $\mathcal{G}(V,\bigcup_{i}\mathcal{E}_{i})$ as follows. To draw $E\sim\mathcal{G}$ , we first draw $i\in[r]$ according to probabilities $p_{i}$ and then return $E\sim\mathcal{G}_{i}$ .

Theorem F.3.

Consider a model $\mathcal{G}$ that is a mixture of $r$ IC models $\mathcal{G}_{i}$ with probabilities $p_{i}$ . Then for all $T\subset V$ ,

[TABLE]

Proof.

We first argue that we can assume without loss of generality that $T$ is a single node and $\bigcup_{i}\mathcal{E}_{i}$ does not contain edges that are incoming to $T$ . We can transform a general case $\mathcal{G}$ and $T$ to this form by contracting all nodes in $T$ into a single node and deleting all edges that are incoming to $T$ . We then retain the same conditional distribution on the remaining edges. Note that this transformation preserves the distribution of $\texttt{R}^{\tau}_{\mathcal{G}}(T)$ and hence also its expectation and variance. The influence values $I^{\tau}_{\mathcal{G}}(v)$ of nodes $v\in V\setminus T$ can only decrease. Finally, the transformed model is also a mixture of correspondingly transformed IC models, where in each such model the distribution of $\texttt{R}^{\tau}_{\mathcal{G}_{i}}(T)$ remains the same and influence values $I^{\tau}_{\mathcal{G}_{i}}(v)$ can only decrease. It follows that the claimed variance bound for the transformed model implies the same bound for the original model.

We construct a new IC model $\mathcal{G^{\prime}}$ with respect to (a single node) $T$ as follows. The new model has nodes $V^{\prime}=\{v\}\cup\bigcup_{i}V_{i}$ , where each $V_{i}$ is a map of $V$ . We create an instantiation of each of our IC models $\mathcal{G}_{i}$ with set of nodes $V_{i}$ and edges $\mathcal{E}_{i}$ with the probabilities as in the model $\mathcal{G}_{i}$ . The new IC model $\mathcal{G^{\prime}}$ has a root node $v$ with weight [math] and for each $i\in[r]$ , there is an edge $(v,T_{i})$ with probability $p_{i}$ , where $T_{i}$ is the image of $T$ in the copy of $\mathcal{G}_{i}$ . We can see that

[TABLE]

that is, the $\tau+1$ steps influence of $v$ in the constructed IC model $\mathcal{G^{\prime}}$ is equal to the $\tau$ steps influence of $T$ in the mixture model $\mathcal{G}$ .

We next consider the variance of the random variables $\texttt{R}^{\tau}_{\mathcal{G}}(T)$ and $\texttt{R}^{\tau+1}_{\mathcal{G^{\prime}}}(v)$ . Both these random variables are a sum of $r$ products of random variables:

[TABLE]

where $X_{i}$ are Bernoulli with probabilities $p_{i}$ . In both cases the random variables $\{\texttt{R}^{\tau}_{\mathcal{G}_{i}}(T)\}$ are independent of each other and also are independent from (the joint distribution of) $\{X_{i}\}$ . But in the case of $\texttt{R}^{\tau+1}_{\mathcal{G^{\prime}}}(v)$ the random variables $X_{i}$ are independent and hence also the products are independent and in the case of $\texttt{R}^{\tau}_{\mathcal{G}}(T)$ , the variables $\{X_{i}\}$ have negative dependence as $\sum_{i}X_{i}=1$ and thus the products $X_{i}\texttt{R}^{\tau}_{\mathcal{G}_{i}}(T)$ are also negatively dependent. Therefore,

[TABLE]

Finally, we bound $\texttt{M}^{\tau}_{\mathcal{G}^{\prime}}(\bar{v})$ by considering the maximum influence of a node other than $v$ in the constructed model $\mathcal{G}^{\prime}$ . For $T_{i}$ we have

[TABLE]

where the last inequality follows from (40). We next consider nodes $z_{i}\in V_{i}$ that is a map of a node $z\in V$ .

[TABLE]

The last inequality follows because for any node $z\in v$ we have $\texttt{I}^{\tau}_{\mathcal{G}}(z)=\sum_{i=1}^{r}p_{i}I^{\tau}_{\mathcal{G}_{i}}(z)$ . Combining (42) and (43) we get

[TABLE]

To conclude, we invoke Theorem 4.1 for the IC model $\mathcal{G}^{\prime}$ :

[TABLE]

We then apply inequalities (44) and the equality (40) to obtain the claim. ∎

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. System Sci. , 58:137–147, 1999.
2[2] E. Balkanski, A. Rubinstein, and Y. Singer. The limitations of optimization from samples. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017 , 2017.
3[3] C. Borg, M. Brautbar, J. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA , 2014.
4[4] W. Chen, L. V. S. Lakshmanan, and C. Castillo. Information and Influence Propagation in Social Networks . Morgan & Claypool, 2013.
5[5] W. Chen, W. Lu, and Y. Zhang. Time-critical influence maximization in social networks with time-delayed diffusion process. In AAAI , 2012.
6[6] W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD . ACM, 2010.
7[7] W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD . ACM, 2009.
8[8] Wei Chen, Tian Lin, Zihan Tan, Mingfei Zhao, and Xuren Zhou. Robust influence maximization. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’16. ACM, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Sample Complexity Bounds for Influence Maximization

Abstract

1 Introduction

Contributions and overview

2 Related work

3 Preliminaries

Utility functions

Reduced models

Strongly submodular SDM

3.1 Relating step-limited and unrestricted Influence

Lemma 3.1**.**

3.2 Influence Oracles

4 Variance Bounds

4.1 Variance upper bound

Theorem 4.1** **(Variance Upper Bound

** Corollary 4.1****.**

4.2 Variance lower bound

Theorem 4.2** (Variance Lower Bound).**

5 The Averaging Oracle

Lemma 5.1**.**

Proof.

5.1 Sketched averaging oracle

6 Confidence Amplification: The median-of-averages oracle

Lemma 6.1**.**

Proof.

Theorem 6.1**.**

Proof.

7 Optimization with Adaptive sample size

Theorem 7.1**.**

Theorem 7.2**.**

8 Approximate Greedy Maximization

Lemma 8.1**.**

Theorem 8.1**.**

Proof.

Theorem 8.2**.**

Conclusion

Acknowledgements

Appendix A Variance upper bound: Proof of Theorem 4.1

Lemma A.1**.**

A.1 Submodular monotone functions on random subsets

Theorem A.1**.**

Lemma A.2**.**

Proof.

Lemma A.3**.**

Proof.

Proof.

A.2 Properties of reduced diffusion models

Lemma A.4**.**

Proof.

Lemma A.5**.**

Proof.

Theorem A.2**.**

Proof.

A.3 Upper bound on the variance in strongly

Theorem A.3**.**

A.3.1 Base case (τ=1\tau=1τ=1)

A.3.2 Inductive step

Lemma A.6**.**

Proof.

Total Variance:

Bound on the first term of the total variance:

Bound on the second term of the total variance:

Combining the bounds of the first and second terms

Appendix B Variance lower bound construction

Lemma B.1**.**

Proof.

Theorem B.1**.**

Proof.

Appendix C Greedy optimization with approximate non-submodular oracle

Lemma C.1**.**

Proof.

Proof of Lemma 8.1.

Appendix D Greedy for live-edge models

Proof.

Lemma 3.1.

Theorem 4.1 (Variance Upper Bound

Corollary 4.1.

Theorem 4.2 (Variance Lower Bound).

Lemma 5.1.

Lemma 6.1.

Theorem 6.1.

Theorem 7.1.

Theorem 7.2.

Lemma 8.1.

Theorem 8.1.

Theorem 8.2.

Lemma A.1.

Theorem A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.

Lemma A.5.

Theorem A.2.

Theorem A.3.

A.3.1 Base case ( $\tau=1$ )

Lemma A.6.

Lemma B.1.

Theorem B.1.

Lemma C.1.

Lemma E.1.

Corollary E.2.

Lemma E.3.

Lemma E.4.

Theorem E.1 (Theorem 7.1).

Lemma E.5.

Theorem E.2.

Theorem E.3 (Theorem 7.2).

F.1 $b$ -dependence models

Theorem F.1.

Theorem F.2.

Theorem F.3.