Designs for estimating the treatment effect in networks with   interference

Ravi Jagadeesan; Natesh Pillai; Alexander Volfovsky

arXiv:1705.08524·math.ST·May 25, 2017

Designs for estimating the treatment effect in networks with interference

Ravi Jagadeesan, Natesh Pillai, Alexander Volfovsky

PDF

TL;DR

This paper proposes new experimental designs for estimating causal effects in networked settings with interference, using a graph coloring approach to improve estimator properties.

Contribution

It introduces a novel quasi-coloring design inspired by matching, enhancing causal inference in networks with interference and homophily.

Findings

01

Classical Neymanian estimator performs well with the new designs.

02

Designs are easily implementable and effective in various interference scenarios.

03

The approach extends to networks with homophily.

Abstract

In this paper we introduce new, easily implementable designs for drawing causal inference from randomized experiments on networks with interference. Inspired by the idea of matching in observational studies, we introduce the notion of considering a treatment assignment as a quasi-coloring" on a graph. Our idea of a perfect quasi-coloring strives to match every treated unit on a given network with a distinct control unit that has identical number of treated and control neighbors. For a wide range of interference functions encountered in applications, we show both by theory and simulations that the classical Neymanian estimator for the direct effect has desirable properties for our designs. This further extends to settings where homophily is present in addition to interference.

Figures2

Click any figure to enlarge with its caption.

Equations273

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr d_{\min}=\min_{v\in V(G)}d(v),\quad d_{\max}=\max_{v\in V(G)}d(v).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr d_{\min}=\min_{v\in V(G)}d(v),\quad d_{\max}=\max_{v\in V(G)}d(v).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\binom{V(G)}{m_{1},\ldots,m_{k}}=\Big{\{}(A_{1},\ldots,A_{k}),\,A_{k}\subset V(G),\,|A_{k}|=m_{k},\,A_{k}\cap A_{\ell}=\emptyset,\forall k\neq\ell\Big{\}}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\binom{V(G)}{m_{1},\ldots,m_{k}}=\Big{\{}(A_{1},\ldots,A_{k}),\,A_{k}\subset V(G),\,|A_{k}|=m_{k},\,A_{k}\cap A_{\ell}=\emptyset,\forall k\neq\ell\Big{\}}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathrm{1}_{\mathrm{T}}(v)=\begin{cases}1&\hbox{if }v\in\mathrm{T},\\ 0&\hbox{if }v\notin T.\end{cases}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathrm{1}_{\mathrm{T}}(v)=\begin{cases}1&\hbox{if }v\in\mathrm{T},\\ 0&\hbox{if }v\notin T.\end{cases}{}&\cr}

χ_{v}^{T} = {q - p if v \in T, if v \in / T .

χ_{v}^{T} = {q - p if v \in T, if v \in / T .

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr y_{v}=x_{v}+\mathrm{1}_{\mathrm{T}}(v)\,t_{v}+f_{v}(\mathrm{T}\cap\mathcal{N}(v)),\quad v\in V(G){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr y_{v}=x_{v}+\mathrm{1}_{\mathrm{T}}(v)\,t_{v}+f_{v}(\mathrm{T}\cap\mathcal{N}(v)),\quad v\in V(G){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\bar{t}=\frac{1}{rn}\sum_{v\in V(G)}t_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\bar{t}=\frac{1}{rn}\sum_{v\in V(G)}t_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\widehat{t}_{\mathrm{Neyman}}=\frac{1}{pqn}\Big{(}q\sum_{v\in\mathrm{T}}y_{v}-p\sum_{v\in V(G)\setminus\mathrm{T}}y_{v}\Big{)}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}y_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\widehat{t}_{\mathrm{Neyman}}=\frac{1}{pqn}\Big{(}q\sum_{v\in\mathrm{T}}y_{v}-p\sum_{v\in V(G)\setminus\mathrm{T}}y_{v}\Big{)}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}y_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\widehat{t}_{\mathrm{Neyman}}=\frac{1}{2n}\Big{(}\sum_{v\in\mathrm{T}}y_{v}-\sum_{v\in V(G)\setminus\mathrm{T}}y_{v}\Big{)}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\widehat{t}_{\mathrm{Neyman}}=\frac{1}{2n}\Big{(}\sum_{v\in\mathrm{T}}y_{v}-\sum_{v\in V(G)\setminus\mathrm{T}}y_{v}\Big{)}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr t_{\mathrm{ideal}}=\frac{1}{pqn}\Big{(}q\sum_{v\in\mathrm{T}}(x_{v}+t_{v})-p\sum_{v\in V(G)\setminus\mathrm{T}}x_{v}\Big{)}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr t_{\mathrm{ideal}}=\frac{1}{pqn}\Big{(}q\sum_{v\in\mathrm{T}}(x_{v}+t_{v})-p\sum_{v\in V(G)\setminus\mathrm{T}}x_{v}\Big{)}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\xi=\widehat{t}_{\mathrm{Neyman}}-t_{\mathrm{ideal}}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}f_{v}(\mathrm{T}\cap\mathcal{N}(v)).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\xi=\widehat{t}_{\mathrm{Neyman}}-t_{\mathrm{ideal}}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}f_{v}(\mathrm{T}\cap\mathcal{N}(v)).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr t_{\mathrm{ideal}}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}(x_{v}+\mathrm{1}_{\mathrm{T}}(v)\,t_{v}).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr t_{\mathrm{ideal}}=\frac{1}{pqn}\sum_{v\in V(G)}\chi^{\mathrm{T}}_{v}(x_{v}+\mathrm{1}_{\mathrm{T}}(v)\,t_{v}).{}&\cr}

E_{T} (t_{ideal})

E_{T} (t_{ideal})

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathbb{E}_{\mathrm{T}}(\widehat{t}_{\mathrm{Neyman}})-\bar{t}=\mathbb{E}_{\mathrm{T}}(t_{\mathrm{ideal}}+\xi)-\bar{t}=\mathbb{E}_{\mathrm{T}}(\xi){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathbb{E}_{\mathrm{T}}(\widehat{t}_{\mathrm{Neyman}})-\bar{t}=\mathbb{E}_{\mathrm{T}}(t_{\mathrm{ideal}}+\xi)-\bar{t}=\mathbb{E}_{\mathrm{T}}(\xi){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathrm{T}_{\vec{B},\mathcal{P}}=\{w^{j}_{i}\mid j\in B_{i}\}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathrm{T}_{\vec{B},\mathcal{P}}=\{w^{j}_{i}\mid j\in B_{i}\}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|f_{v}(A)-f_{v}(B)\right|\leq\frac{K_{v}|A\Delta B|}{d(v)}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|f_{v}(A)-f_{v}(B)\right|\leq\frac{K_{v}|A\Delta B|}{d(v)}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathcal{P}_{v}=S_{i},\quad v\in S_{i}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathcal{P}_{v}=S_{i},\quad v\in S_{i}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|\mathbb{E}_{\mathrm{T}}(\xi)\right|\leq\frac{1}{nr(r-1)}\sum_{v\in V(G)}\frac{\left|\mathcal{P}_{v}\cap\mathcal{N}(v)\right|}{d(v)}K_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|\mathbb{E}_{\mathrm{T}}(\xi)\right|\leq\frac{1}{nr(r-1)}\sum_{v\in V(G)}\frac{\left|\mathcal{P}_{v}\cap\mathcal{N}(v)\right|}{d(v)}K_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\bar{K}=\frac{1}{rn}\sum_{v\in V(G)}K_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\bar{K}=\frac{1}{rn}\sum_{v\in V(G)}K_{v}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathbb{E}_{\mathcal{P}}\left|\mathbb{E}_{\vec{B}}(\xi\mid\mathcal{P})\right|\leq\frac{\bar{K}}{rn-1},{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathbb{E}_{\mathcal{P}}\left|\mathbb{E}_{\vec{B}}(\xi\mid\mathcal{P})\right|\leq\frac{\bar{K}}{rn-1},{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|\mathbb{E}_{\mathrm{T}}(\xi)\right|\leq\frac{\bar{K}}{rn-1}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\left|\mathbb{E}_{\mathrm{T}}(\xi)\right|\leq\frac{\bar{K}}{rn-1}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr d(V(G))=\{d(v),v\in V(G)\}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr d(V(G))=\{d(v),v\in V(G)\}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathcal{B}=\{(a,b)\in\mathbb{Z}_{\geq 0}^{2}\mid a+b\in d(V(G))\}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\mathcal{B}=\{(a,b)\in\mathbb{Z}_{\geq 0}^{2}\mid a+b\in d(V(G))\}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr f_{v}(S)=f(|S|,|\mathcal{N}(v)\setminus S|){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr f_{v}(S)=f(|S|,|\mathcal{N}(v)\setminus S|){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\vec{d}_{\mathrm{T}}(v)=(|\mathrm{T}\cap\mathcal{N}(v)|,|\mathcal{N}(v)\setminus\mathrm{T}|){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\vec{d}_{\mathrm{T}}(v)=(|\mathrm{T}\cap\mathcal{N}(v)|,|\mathcal{N}(v)\setminus\mathrm{T}|){}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr D_{\mathrm{T}}(u)=\frac{1}{pqn}\sum_{v\in V}\chi^{\mathrm{T}}_{v}\delta_{\vec{d}_{\mathrm{T}}(v)}(u),\quad u\in\mathcal{B}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr D_{\mathrm{T}}(u)=\frac{1}{pqn}\sum_{v\in V}\chi^{\mathrm{T}}_{v}\delta_{\vec{d}_{\mathrm{T}}(v)}(u),\quad u\in\mathcal{B}{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\xi=\int_{\mathcal{B}}f\,dD_{\mathrm{T}}=\sum_{u\in\mathcal{B}}f(u)D_{\mathrm{T}}(u).{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\xi=\int_{\mathcal{B}}f\,dD_{\mathrm{T}}=\sum_{u\in\mathcal{B}}f(u)D_{\mathrm{T}}(u).{}&\cr}

Q = {(v, (ϵ_{w})_{w \in V (H)}) ∣ u_{v} = 1}

Q = {(v, (ϵ_{w})_{w \in V (H)}) ∣ u_{v} = 1}

ψ (v, ϵ) = (v, (ϵ_{V (H) ∖ {v}}, 1 - ϵ_{v})) .

ψ (v, ϵ) = (v, (ϵ_{V (H) ∖ {v}}, 1 - ϵ_{v})) .

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\|f\|_{\mathbf{d}}=\sup_{u_{1},u_{2}\in\mathcal{B},u_{1}\neq u_{2}}{|f(u_{1})-f(u_{2})|\over\mathbf{d}(u_{1},u_{2})}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\|f\|_{\mathbf{d}}=\sup_{u_{1},u_{2}\in\mathcal{B},u_{1}\neq u_{2}}{|f(u_{1})-f(u_{2})|\over\mathbf{d}(u_{1},u_{2})}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\|D\|_{\mathbf{d}_{\mathrm{w}}}=\sup_{\|f\|_{\mathbf{d}}\leq 1}\Big{\|}\int_{\mathcal{B}}f\,dD\Big{\|}.{}&\cr}

\halign to=0.0pt{\MH@restoretag\MHs@tr@m\hfil$\MH@lign\displaystyle{\MHsh@rp}$&\MHdecrt@t$\MH@lign\displaystyle{{}\MHsh@rp}$\hfil&\my@MHput{\MHsh@rp}\cr\|D\|_{\mathbf{d}_{\mathrm{w}}}=\sup_{\|f\|_{\mathbf{d}}\leq 1}\Big{\|}\int_{\mathcal{B}}f\,dD\Big{\|}.{}&\cr}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Designs for estimating the treatment effect in Networks with

Interference

Ravi Jagadeesanlabel=e1][email protected] [

Natesh S. Pillailabel=e2][email protected] [

Alexander Volfovskylabel=e3][email protected] [

Harvard University\thanksmarkm1 and Duke University \thanksmarkm2

Abstract

In this paper we introduce new, easily implementable designs for drawing causal inference from randomized experiments on networks with interference. Inspired by the idea of matching in observational studies, we introduce the notion of considering a treatment assignment as a “quasi-coloring” on a graph. Our idea of a perfect quasi-coloring strives to match every treated unit on a given network with a distinct control unit that has identical number of treated and control neighbors. For a wide range of interference functions encountered in applications, we show both by theory and simulations that the classical Neymanian estimator for the direct effect has desirable properties for our designs. This further extends to settings where homophily is present in addition to interference.

Experimental Design,

Network Interference,

Neyman Estimator,

Symmetric Interference Model,

Homophily,

keywords:

\setattribute

journalname

and and t1Undergraduate Student, Harvard University t2Associate Professor, Department of Statistics, Harvard University t3Assistant Professor, Department of Statistical Science, Duke University

1 Introduction.

In this paper, we construct and analyze new designs for estimating treatment effects from randomized experiments in networks with interference. With the proliferation of network data and the steady increase in the number of experiments conducted on networks, understanding the behavior of individuals in a network has become an important issue in many scientific fields. Epidemiologists study the transmission of disease over social networks [1], computer scientists are interested in information diffusion in large computer networks [23, 7] and sociologists study the effects of school integration on friendship networks [15]. While much of the early statistical work on networks focussed on models for understanding network formation [8, 10], there has been a recent surge in drawing causal inference from experiments on networks [19, 6, 20, 22, 21].

A time-honored approach to performing causal inference from randomized experiments entails the following steps [9, 17, 16]: (i) define the population of units, (ii) define the treatment assignment and (iii) define the quantity, or estimand, of interest. When an experiment is conducted on a network, we must revisit each of these elements. First, the object of inference can be the network, the edges of the network or the nodes of the network. We focus on the case where the nodes are the experimental units and our population is just the observed units. Next, the treatment assignment mechanisms proposed in this paper are conditional on a given network and thus the events that any two units receive treatment are not independent. This is in stark contrast to usual Bernoulli-type randomization mechanisms where treatment is assigned to units independently or with very weak dependence. Finally, our estimand of interest is the direct treatment effect (effect of treatment on the treated unit irrespective of the treatment status of the rest of the network) discussed below.

Much of the current works on causal inference on networks study generic Bernoulli-type randomization schemes and construct various estimators for minimizing their Mean-Squared Error (MSE); a notable exception is the recent work [6]. In contrast, we fix an estimator of interest and focus on the design of treatment assignments. We study the classical Neymanian estimator that takes the difference between the means of the outcome for treated nodes and the control nodes. Our approach is motivated by two key reasons: (i) The Neymanian estimator is ubiquitously used. It is a natural estimator for the direct effect and improves on reweighted versions of it (such as Horvitz-Thompson, Hajek, etc.) due to its prima facie interpretability. (ii) It has been emphasized by many researchers that for objective causal inference, “design trumps analysis” [18]. It is known that this estimator is biased under standard designs such as Bernoulli trials (every unit has probability of treatment $p$ ) and a completely randomized design (a fraction $p$ of the units is assigned to treatment). We consider a more natural randomization scheme that works to remove the effects of interference and homophily by balancing the relevant distributions between treated and untreated nodes.

Conceptually, our main contribution is the idea of considering a treatment assignment as a “quasi-coloring” of a graph (see Definition 5.1). Roughly speaking, a treatment assignment is a perfect quasi-coloring111The word coloring is reserved for something specific in graph theory; thus we use the phrase “quasi-coloring”., if for every treated vertex $v$ (represented by black dots, say), there is a non-treated vertex $v^{\prime}$ (represented by gray dots) that has the same number of treated and non-treated neighbors as that of $v$ . Thus having a perfect quasi-coloring on a graph $G$ ensures that one can color the graph in such a way that for every black vertex, there exists a distinct gray vertex with identically colored neighbors. Figure 1 shows two instance of coloring a square, where one is a perfect quasi-coloring and the other is not. For multiple treatments, this definition can be naturally extended to perfect quasi-colorings with $k$ colors.

Our notion of perfect quasi-colorings is inspired by the idea of covariate balance in the context of matching in observational studies. For any given network, if a treatment assignment mechanism satisfies our notion of quasi-coloring, we prove that the Neymanian estimator for the direct treatment effect is unbiased for a wide range of families of interference effects encountered in practice. This replicates the behavior of the Neymanian estimator in classical randomized experiments. It turns out that, for many graphs, perfect quasi-colorings are not available or may be very difficult to construct. To circumvent this issue, we develop treatment assignment mechanisms that correspond to “approximately perfect quasi-colorings”. The closer an approximately perfect quasi-coloring is to a perfect quasi coloring, the smaller its bias. Based on this notion we develop a new restricted randomization design that reduces bias and variance. In networks where a perfect quasi-coloring is not possible, we give easily implementable algorithms to construct designs with desirable properties; see the “partitioning by degree” design in Definition 6.1.

We give bounds for the bias and variance of our estimator under a few different settings of approximate quasi-colorings. These results are then used to prove asymptotic consistency of our estimator in both dense and sparse asymptotic regimes for network growth. We also derive bounds for the MSE of the Neymanian estimator under homophily. We demonstrate the efficacy of our proposed randomization scheme in a series of simulations — varying both the type of interference and the network. Our proofs for the dense vs. sparse graphs are different and thus are of independent interest.

1.1 Background and literature.

We briefly survey the relevant literature, point out connections to the present paper and place our work in a broader context. In situations when the experimental units are connected in a network, some of the usual assumptions used in other settings are not likely to hold. For example, the stable unit treatment value assumption [16] requires that the outcome for a unit only depends on its own treatment, and in particular is independent of the treatment assignment mechanism. For networks this can be violated in several ways: it is likely that either the behavior of connected units is similar (homophily), that their outcomes are associated with the treatment of their network neighbors (interference) or that the treatment effect passes temporally across the network (contagion). It has been previously demonstrated that while these can affect causal inference on a network differently, they are difficult, if not impossible, to distinguish [19]. These complications lead to a difficulty in specifying an estimand of interest [11]. The four main estimands in the presence of a network are (i) the effect of treatment were it applied to the whole network versus no one in the network (total network treatment effect), (ii) the direct effect of treatment on the treated unit irrespective of the treatment status of the rest of the network (direct treatment effect), (iii) the spillover effect of treatment of the network on a single unit irrespective of its treatment (indirect treatment effect), and (iv) the sum of the direct and indirect effect (total nodal treatment effect).

Different estimands lead to different inference procedures – both from a design and an analysis point of view. We focus on the design of experiments targeting the direct treatment effect. Other recent work has targeted different estimands: In Choi [5] the author studies estimators for monotone treatment effects and constructs asymptotically consistent bounds for such estimates. Eckles, Karrer and Ugander [6] study total network effects by considering a cluster-randomized-design in conjunction with Horvitz-Thompson and Hajek estimators. In Sussman and Airoldi [20], the authors construct unbiased estimators for direct and indirect treatment effects for a fixed design.

The above works study the effects of interference on estimation and make the common assumption that the interference is limited to the immediate neighborhood of a node. We will also make this assumption but our work can be easily generalized to different patterns of interference; see Discussion for more on this point. Another simplifying assumption that is frequently made requires the interference effect to be symmetric – that is each interfering unit contributes the same indirect effect. We demonstrate results under several classes of interference patterns that generalize this assumption.

1.2 Notation.

Fix $n\in\mathbb{N}$ and let $G$ be a graph with $|V(G)|=rn$ . Throughout the paper, we will assume that $G$ has no isolated vertices. Let $\mathcal{N}(v)$ denote the set of neighbors of a vertex $v\in V(G)$ and let $d(v)=|\mathcal{N}(v)|$ denote the degree of $v$ . Also define the minimum and maximum degrees:

[TABLE]

We will denote by $\binom{V(G)}{k}$ , the set containing all subsets of $V(G)$ with cardinality $k$ . Similarly, for $1\leq m_{k}\leq rn$ with $\sum m_{k}=rn$ , define

[TABLE]

In particular, $\binom{V(G)}{r,\ldots,r}$ denotes the set of all partitions of $V(G)$ into sets of size $r$ . For $r\in\mathbb{N}$ , the set $\{1,2,\dots,r\}$ will be denoted by $[r]$ . For sets $A,B\subset V(G)$ , $A\Delta B$ denotes the set difference.

For $\mathrm{T}\subset V(G)$ , let $\mathrm{1}_{\mathrm{T}}(\cdot)$ denote the indicator function:

[TABLE]

For $p\in\mathbb{N}$ , we will have $pn$ treated units, and thus $|\mathrm{T}|=pn$ . Set $q=r-p$ . For $\mathrm{T}\subseteq V(G)$ and $v\in V(G),$ let

[TABLE]

1.3 Paper guide.

In Section 2 we introduce a basic model for interference and the Neymanian estimator. In Section 3 we discuss some restricted randomizations. Section 4 contains a symmetric interference model. In Section 5 we define our notion of quasi-coloring. We derive the bounds for the MSE of the Neymanian estimator in Section 6. Section 7 introduces a generalization of the symmetric interference model from Section 4. In Section 8 we study the effects of homophily on the treatment effect. The results from a simulation study are given in Section 9. We close with a short discussion. The proofs for various technical results are given in the appendices.

2 The Model and the Estimator.

For each vertex $v\in V(G)$ , let $x_{v},t_{v}\in\mathbb{R}$ be constants and let $f_{v}:2^{\mathcal{N}(v)}\mapsto\mathbb{R}$ be a function such that $f_{v}(\emptyset)=0$ for all $v\in V(G)$ . We study the linear model:

[TABLE]

where $\mathrm{T}\subset V(G)$ denotes the treatment group. In general, the quantity $x_{v}$ can be thought of as vertex specific attributes, such as covariates. When no covariates are observed, it simply reflects the outcome for node $v$ under control. The function $f_{v}$ denotes the interference effect. For every vertex $v$ , it is only a function of its treated neighbors $\mathrm{T}\cap\mathcal{N}(v)$ .

This model (without observed covariates) is a member of the class of neighborhood interference models introduced by [20]. In particular, they demonstrate that this parametrization is equivalent to the potential outcomes notation of [16] under specific assumptions on the additivity and symmetry of the effects. In particular, Equation LABEL:eqn:linmod corresponds to the additivity of main effects assumption (ANIA) — the second most general model in that paper. That is, $x_{v}$ is the baseline, $t_{v}$ is the direct treatment effect (defined as the effect of treatment on node $v$ when no one else is treated) and $f_{v}$ is the interference effect. While Sussman and Airoldi [20] construct new estimators for the average treatment effect, we focus on better designs for the Neymanian estimator defined below.

Define the average direct treatment effect as:

[TABLE]

We are interested in estimating $\bar{t}$ . Throughout the paper, we will have $r$ groups of experimental units, and in each group $p$ units will receive treatment. When $|\mathrm{T}|=pn$ , define the Neymanian estimator

[TABLE]

When $p=q=1$ and $r=2$ , the estimator $\widehat{t}_{\mathrm{Neyman}}$ has the usual form:

[TABLE]

Define the quantity

[TABLE]

The difference222The quantity $\xi$ is a function of the treatment $\mathrm{T}$ , but we suppress this dependence for notational convenience.

[TABLE]

is the “average interference effect”. Next, we show that bounds on $|\mathbb{E}_{\mathrm{T}}(\xi)|$ lead to bounds on the bias of $\widehat{t}_{\mathrm{Neyman}}$ . Here, $\mathbb{E}_{\mathrm{T}}$ denotes that the expectation is taken over the treatment assignment mechanism.

Lemma 2.1.

Suppose that $\mathrm{T}\subset V(G)$ is selected in a fashion so that $\mathbb{P}(v\in\mathrm{T})=\frac{p}{r}$ for all $v\in V(G)$ . Then, $\mathbb{E}_{\mathrm{T}}(\widehat{t}_{\mathrm{Neyman}})-\bar{t}=\mathbb{E}_{\mathrm{T}}(\xi)$ .

Proof.

The quantity $t_{\mathrm{ideal}}$ in (LABEL:eqn:ideal) can be written as

[TABLE]

Since $\mathbb{P}(v\in\mathrm{T})=\frac{p}{q}$ for all $v\in V(G),$ we obtain that

[TABLE]

Thus

[TABLE]

proving the lemma. ∎

3 Restricted Randomizations.

Fix a partition $\mathcal{P}=(S_{1},\ldots,S_{n})\in\binom{V}{r,\ldots,r}$ of the vertices $V$ into sets $S_{i}=\{w^{1}_{i},\ldots,w^{r}_{i}\}$ . Define a random vector $\vec{B}=(B_{1},\ldots,B_{n})\in\binom{[r]}{p}^{n}$ with $B_{i}$ i.i.d. uniform on $\binom{[r]}{p}$ . Conditional on the partition $\mathcal{P}$ , we define our treatment assignment mechanism to be:

[TABLE]

Thus we will give treatment to the vertex $w^{j}_{i}$ when $j\in B_{i}$ .

The usual Completely Randomized Design (CRD) for treatment assignments is recovered when $\mathcal{P}$ is sampled uniformly from the set $\binom{V(G)}{r,\ldots,r}$ . In this section, we obtain bounds for the bias of $\widehat{t}_{\mathrm{Neyman}}$ with treatment group $\mathrm{T}_{\vec{B},\mathcal{P}}$ for a fixed partition $\mathcal{P}\in\binom{V(G)}{r,\ldots,r}$ .

3.1 General upper bound on bias.

The following definition introduces a useful framework for quantifying the variability of the interference effect $f_{v}$ across the units.

Definition 3.1.

The function $f_{v}$ , $v\in V(G)$ , is called $K_{v}$ -Lipshitz if

[TABLE]

for $K_{v}>0$ and all $A,B\subset\mathcal{N}(v)$ .

Thus the Lipshitz constant $K_{v}$ provides an upper bound on the amount that treating a proportion of the neighbors of $v$ can affect $y_{v}$ .

Example 3.2.

The linear interference function $f_{v}(A)=\gamma|A|$ for $\gamma\in(0,1)$ is $\gamma d(v)$ -Lipschitz. Moreover, the normalized linear interference function $f_{v}(A)=\gamma\frac{|A|}{d(v)}$ is $\gamma$ -Lipschitz.

The following lemma bounds the bias of $\widehat{t}_{\mathrm{Neyman}}$ with treatment $\mathrm{T}_{\vec{B},\mathcal{P}}$ when $f_{v}$ is Lipshitz. The idea of the proof is to apply Lemma 2.1 to reduce to bounding the expectation of $\xi$ . The Lipshitz condition yields a termwise bound on $\xi$ in (LABEL:eqn:xi). Given $v\in V,$ let

[TABLE]

Lemma 3.3.

Suppose the function $f_{v}$ is $K_{v}$ -Lipshitz. Then for the partition $\mathcal{P}$ and the treatment assignment in (LABEL:eqn:tbp), we have

[TABLE]

Lemma LABEL:lem:indunbiased has the following important corollary:

Corollary 3.4.

If every element of $\mathcal{P}$ is an independent set in $G$ , i.e., if $\{v,v^{\prime}\}\notin E(G)$ whenever $\{v,v^{\prime}\}\subseteq S_{i}$ , and $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}},$ then $\mathbb{E}_{\mathrm{T}}(\widehat{t}_{\mathrm{Neyman}})=0.$

Proof.

Indeed, in this case, the right hand side of Lemma 3.3 has no terms in this case. Thus we have $|\mathbb{E}_{\mathrm{T}}(\xi)|=0$ and the proof follows from Lemma 2.1. ∎

Thus Corollary 3.4 implies that if we choose clusters $(S_{i})$ of independent sets and then randomize within those clusters for the treatment assignment, then $\widehat{t}_{\mathrm{Neyman}}$ will be unbiased. Thus a design principle will be to ensure that elements of $\mathcal{P}$ do not contain too many edges of $G$ using appropriate randomizations.

3.2 Random choices of $\mathcal{P}$ .

In this section, we assume that the function $f_{v}$ is $K_{v}$ -Lipshitz. Define the average Lipschitz constant

[TABLE]

Example 3.5.

Let $f_{v}(A)=\gamma|A|$ for some $\gamma\in(0,1)$ . Then $f_{v}$ is $K_{v}$ -Lipshitz with $K_{v}=\gamma d(v)$ . When the underlying graph $G$ has average degree $m$ , $\bar{K}=\gamma m$ .

Choosing $\mathcal{P}$ randomly can help reduce the bias, as the following proposition shows. As will be seen in the sequel, it will be helpful to restrict the randomization of $\mathcal{P}$ to reduce the MSE.

Proposition 3.6.

When $\mathcal{P}$ is sampled uniformly from $\binom{V(G)}{r,\ldots,r}$ , we have

[TABLE]

where $\bar{K}$ is as in (LABEL:eqn:kbar).

The following result is immediate from Proposition 3.6.

Corollary 3.7.

When $\mathrm{T}$ is sampled uniformly from $\binom{V(G)}{pn}$ , we have

[TABLE]

Corollary 3.7 generalizes the result of Karwa and Airoldi [12] for the case of $f_{v}(S)=\gamma|S|$ and $p=q=1$ . As mentioned in Example 3.5, when $G$ has average degree $m$ , $\bar{K}=\gamma m$ . Thus by Corollary 3.7, we obtain that $\left|\mathbb{E}_{\mathrm{T}}(\xi)\right|\leq\frac{\gamma m}{2n-1}$ .

4 Symmetric interference model.

In this section, we introduce a simple, but natural type of interference function where the interference effect on a vertex depends only on the numbers of its neighbors that are not treated. Set

[TABLE]

and define

[TABLE]

Definition 4.1.

The collection of functions $\{f_{v}:v\in G\}$ is called a symmetric interference model without types if there is a function $f:\mathcal{B}\mapsto\mathbb{R}$ such that

[TABLE]

for all $v\in V(G)$ .

Thus in Definition 4.1, all vertices share the same interference function. In the next section, we will allow different types of vertices to have different interference functions.

Example 4.2.

The family of interference functions $f_{v}(S)=\gamma|S|$ is achieved in a symmetric interference model without types when $f(a,b)=\gamma a.$ A similar related example is that $f_{v}(S)=\gamma\frac{|S|}{d(v)}$ is achieved in a symmetric inference model when $f(a,b)=\frac{a}{a+b}$ in Definition 4.1.

Example 4.3.

In many natural examples, after a certain threshold, adding more number of treated neighbors does not change the interference effect. This can be modeled as $f_{v}(S)=\gamma\min\{|S|,k\}$ (corresponding to interference only due to the first $k$ treated neighbors) and $f_{v}(S)=\gamma\min\left\{\frac{|S|}{d(v)},p/r\right\}$ (corresponding to interference by only the first $p/r$ proportion of treated neighbors). Both of these functions are examples of symmetric interference model.

For $\mathrm{T}\subseteq V(G)$ and $v\in V(G)$ , let

[TABLE]

denote the $\mathrm{T}$ -bidegree of $v$ . Let $\Delta^{0}(\mathcal{B})$ denote the space of finite, signed measures on $\mathcal{B}$ of total mass 0. When $|\mathrm{T}|=pn$ , define the measure $D_{\mathrm{T}}\in\Delta^{0}(\mathcal{B})$ as

[TABLE]

where $\mathcal{B}$ is as defined in (LABEL:eqn:SB). Clearly, $D_{\mathrm{T}}(\mathcal{B})=\sum_{u\in\mathcal{B}}D_{\mathrm{T}}(u)=0$ . When the interference function $f_{v}$ is of symmetric type as in (LABEL:eqn:fsymm), the quantity $\xi$ in Equation (LABEL:eqn:xi) can be expressed compactly as

[TABLE]

5 Perfect quasi-colorings and designs for symmetric interference model.

In this section, we introduce our idea of perfect quasi-colorings, and use these to construct designs for the symmetric interference model. Throughout this subsection, we will assume that $r=2$ and $p=q=1,$ so that the target treatment fraction is $\frac{1}{2}$ .

The following notion of perfect quasi-coloring lets us identify the treatment groups so that the interference effect $\xi$ is identically zero.

Definition 5.1.

A perfect quasi-coloring is a set $Q\in\binom{V(G)}{n}$ that satisfies $D_{Q}=0$ .

The following result implies that $\xi=0$ for the the treatment groups $\mathrm{T}=Q$ and $\mathrm{T}=V(G)\setminus Q$ when $Q$ is a perfect quasi-coloring.

Proposition 5.2.

Let $Q\in\binom{V(G)}{n}$ . The following are equivalent in a symmetric model.

•

$Q$ * is a perfect quasi-coloring.*

•

$V(G)\setminus Q$ * is a perfect quasi-coloring.*

•

If $\mathrm{T}=Q$ , for every function $f_{v}$ of the form (LABEL:eqn:fsymm), $\xi=0$ .

•

If $\mathrm{T}=V(G)\setminus Q$ , for every function $f_{v}$ of the form (LABEL:eqn:fsymm), $\xi=0$ .

•

If the treatment $\mathrm{T}$ is chosen uniformly and randomly between $Q$ and $V(G)\setminus Q$ , for every function $f_{v}$ of the form (LABEL:eqn:fsymm), $\xi=0$ .

Remark 5.3.

Intuitively, randomizing between $\mathrm{T}=Q$ and $\mathrm{T}=V(G)\setminus Q$ when $Q$ is a perfect quasi-coloring makes $\widehat{t}_{\mathrm{Neyman}}$ unbiased because (1) interference effects cancel and (2) each vertex is treated with probability $\frac{1}{2}$ , so that each treatment effect enters the estimate with probability $\frac{1}{2}$ .

Proof.

First, we will show that $Q$ is a perfect quasi-coloring then if and only if $V(G)\setminus Q$ is. Define $\tau:\mathcal{B}\to\mathcal{B}$ by $\tau(a,b)=(b,a)$ . Let $\tau_{*}D_{Q}$ be the push forward measure of $D_{Q}$ by the function $\tau$ . By construction, it follows that $\tau_{*}D_{Q}=-D_{V(G)\setminus Q}.$ Thus we conclude that $D_{Q}=0$ if and only if $D_{V(G)\setminus Q}=0.$

Next, we will prove that $Q$ is a perfect quasi-coloring if and only if $\xi=0$ for all $f$ when $\mathrm{T}=Q$ . Since, $\xi=\int_{\mathcal{B}}f\,dD$ by Equation (LABEL:eqn:intxi), this assertion follows immediately. The lemma follows because the distribution $\xi$ with treatment chosen uniformly and randomly between $Q$ and $V(G)\setminus Q$ is a 50–50 mixture of point masses at the values of $\xi$ with $T=Q$ and $T=V(G)\setminus Q$ . ∎

The following example shows that highly homogeneous graphs admit perfect quasi-colorings.

Example 5.4 (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).

Let $H$ be an arbitrary graph with $|V(H)|>1$ . Let $G=H\times\{0,1\}^{V(H)}$ . We claim that

[TABLE]

is a perfect quasi-coloring of $G$ . To see this, define an involution $\psi:V(G)\to V(G)$ by

[TABLE]

Note that, for all $w\in V(G),$ exactly one of $w$ and $\psi(w)$ is in $Q$ and $w$ and $\psi(w)$ have the same number of neighbors in $Q$ (resp. $V(G)\setminus Q$ ). It follows that $D_{Q}=0.$

The class of graphs considered by Example 5.4 is quite specific. Unfortunately, not even $2k$ -regular graphs need to admit a perfect quasi-coloring, as the following example shows.

Example 5.5 (A hexagon does not have a perfect quasi-coloring).

Let $G$ be a hexagon. Thus $V(G)=\{1,2,\ldots,6\}$ with an edge drawn between $i$ and $i+1$ modulo 6 for all $i$ . Let $B\in\binom{V(G)}{3}$ .

We claim that $B$ is not perfect. Indeed, if $B$ contains three consecutive elements of $V(G),$ then the support of $D_{B}$ contains $(2,0)$ . If $B$ does not contain any three consecutive elements of $V(G),$ then the support of $D_{B}$ contains $(0,2)$ . In either case, we have $D_{B}\not=0$ . This example motivates studying other estimators in addition to $\widehat{t}_{\mathrm{Neyman}}$ ; see Discussion for more on this point.

Example 5.5 suggests that it might not be fruitful to search for perfect quasi-colorings in arbitrary graphs. In general, we can only hope to control the size of $\xi$ . Proposition 5.2 yields that $\xi=0$ for a perfect quasi-coloring. It is then natural to ask whether an “almost perfect quasi-coloring” will imply that the corresponding $\xi$ is close to zero. In the next section, we show that this is indeed the case, quantify this intuition and use it constructing new designs.

5.1 Quantifying the notion of perfect quasi-coloring.

Let $\mathbf{d}$ be a metric on $\mathcal{B}$ . For $f:\mathcal{B}\mapsto\mathcal{B}$ , define the Lipschitz norm

[TABLE]

For a measure $D\in\Delta^{0}(\mathcal{B})$ , define the Wasserstein norm

[TABLE]

Since the total mass is [math] for any $D\in\Delta^{0}(\mathcal{B})$ , we have that

[TABLE]

where $\|-\|_{\mathrm{TV}}$ denotes the total variation norm. From Equation (LABEL:eqn:intxi), we can deduce that if the interference function $f:\mathcal{B}\mapsto\mathbb{R}$ is Lipschitz with respect to a metric $\mathbf{d}$ , then

[TABLE]

For a treatment assignment $\mathrm{T}$ that is a perfect quasi-coloring, we have $D_{\mathrm{T}}=0$ and thus $\xi=0$ . Equation LABEL:eqn:xiLipbd shows that $\xi$ is continuous in $\|D_{\mathrm{T}}\|_{\mathbf{d}_{\mathrm{w}}}$ .

While (LABEL:eqn:xiLipbd) holds for any metric $\mathbf{d}$ , we will use the following metric $\mathbf{d}=\mathbf{d}_{K}$ :

Definition 5.6.

Fix $K=(K_{1},K_{2})$ with $K_{1}\geq 0$ and $K_{2}>0$ , define the metric $\mathbf{d}_{K}$ on $\mathcal{B}$ : for all $(a,b),(c,d)\in\mathcal{B}$ ,

[TABLE]

where $d_{\max}$ is as in (LABEL:eqn:dmin).

Remark 5.7.

Since we assume that $G$ does not have any isolated vertices, $\mathbf{d}_{K}$ is indeed a metric on $\mathcal{B}$ .

Remark 5.8.

The choice of a metric is crucial for our estimates. The main point here is that the chosen metric must capture the key features of the interference model. To measure the similarity of two vertices, the metric $\mathbf{d}_{K}$ in (LABEL:eqn:nuK) just takes the differences in the fraction of the treated neighbors and the differences of the degrees between the vertices. This is justified here because, the symmetric interference model by definition depends only on these quantities. Different metrics could be used for other choices of interference functions.

For $\mathcal{P}=(S_{1},\ldots,S_{n})\in\binom{V(G)}{r,\ldots,r}$ , define the constant

[TABLE]

The following proposition bounds the $L^{2}$ norm of $\|D_{\mathrm{T}}\|_{\mathbf{d}_{\mathrm{w}}}$ :

Proposition 5.9.

Fix $\mathcal{P}\in\binom{V(G)}{r,\ldots,r}$ and let $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ as in (LABEL:eqn:tbp). We have

[TABLE]

where $\mathcal{P}_{v}$ is as in (LABEL:eqn:pv).

The idea behind the proof of Proposition 5.9 is to bound the contributions of each vertex to the left-hand-side, and use the fact that $\mathrm{T}\cap S_{i}$ and $\mathrm{T}\cap S_{j}$ are independent for $i\not=j,$ where $\mathcal{P}=(S_{1},\ldots,S_{n})$ .

Proposition 5.9 and Equation (LABEL:eqn:xiLipbd) imply the following upper bound on the $L^{2}$ norm of $\xi$ for the treatment $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ .

Corollary 5.10.

*Let the interference function $f:\mathcal{B}\mapsto\mathbb{R}$ be such that $\|f\|_{\mathbf{d}_{K}}\leq 1$ . Then *

[TABLE]

for all $\mathcal{P}$ when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ .

Remark 5.11.

In the case of a complete graph on $rn$ vertices, we have $C_{{\mathcal{P}}}=0$ and hence

[TABLE]

Thus, for fixed $p,q,r,$ we have $\sqrt{\mathbb{E}_{\vec{B}}|\xi|^{2}}=O(n^{-1/2})$ .

6 New Designs and MSE for $\widehat{t}_{\mathrm{Neyman}}$ .

In this section, we will use the idea of perfect quasi-coloring and Proposition 5.9 to construct new designs for $\widehat{t}_{\mathrm{Neyman}}$ and derive bounds for its MSE. We study the dense ( $d_{\min}\rightarrow\infty$ as $n\rightarrow\infty$ ) and sparse ( $d_{\min}=O(1)$ and $d_{\max}\rightarrow\infty$ as $n\rightarrow\infty$ ) cases separately, since our methods and assumptions are different for dense vs. sparse graphs.

6.1 Dense Graphs.

A key term appearing in the right hand side of Proposition 5.9 is the constant $C_{{\mathcal{P}}}$ , which is solely a function of the partition $\mathcal{P}$ . Thus we seek for designs which will lead to smaller values for $\mathcal{P}$ . To this end, we introduce the following new design which we call “partitioning by degree”:

Definition 6.1.

*Let $\{w^{*}_{i}\},1\leq i\leq rn$ be an enumeration of the vertices of $G$ such that $d(w^{*}_{i})\geq d(w^{*}_{i^{\prime}})$ whenever $i>{i^{\prime}}$ . Choose *

[TABLE]

for $1\leq i\leq n$ . Finally set

[TABLE]

Thus the partition $\mathcal{P}^{*}$ is chosen by first rank ordering the vertices by degree and then pairing vertices of similar degree. The following is a key observation:

Lemma 6.2.

For $\mathcal{P}^{*}$ chosen according to partitioning by degree as in Definition 6.1, we have

[TABLE]

where $C_{\mathcal{P}^{*}}$ in the corresponding constant in Corollary 5.10.

Proof.

Breaking each appearance of $d(v)-d(v^{\prime})$ in (LABEL:eqn:cp) into a sum of terms of the form $d(w^{*}_{k})-d(w^{*}_{k+1}),$ we have

[TABLE]

as desired. ∎

As an immediate consequence of Lemma 6.2, Lemma 3.3, Proposition 5.9, and Corollary 5.10, we have the following bound:

Theorem 6.3.

When $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}^{*}}$ ,

[TABLE]

where $d_{\min}$ is as in (LABEL:eqn:dmin). If in addition, the interference function $f$ satisfies $\|f\|_{\mathbf{d}_{K}}\leq 1$ , then

[TABLE]

Theorem 6.3 immediately yields that interference does not affect the consistency of the estimator $\widehat{t}_{\mathrm{Neyman}}$ for our randomized design when $G$ grows large and dense.

Corollary 6.4.

Let $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}^{*}}$ and fix $p,q,r\in\mathbb{N}$ . If $d_{\min}\to\infty$ as $n\rightarrow\infty$ , then the mean squared error of $\widehat{t}_{\mathrm{Neyman}}$ goes to zero as $n\to\infty$ .

The following example shows that the restricted randomization $\mathrm{T}_{\vec{B},\mathcal{P}^{*}}$ , where $\mathcal{P}^{*}$ is obtained by partitioning by degree as in (LABEL:eqn:Pstar), can significantly outperform the CRD in terms of reducing the mean squared error of $\widehat{t}_{\mathrm{Neyman}}$ .

Example 6.5.

Let $p=q=1.$ Let $V(G)=\{v_{1},\ldots,v_{2k},w_{1},\ldots,w_{2k}\}$ , and let the edges of $G$ be $\{v_{i},v_{j}\}$ . Thus, $G$ is the disjoint union of a complete graph on $2k$ vertices $V=\{v_{1},\ldots,v_{2k}\}$ with $2k$ additional vertices $W=\{w_{1},\ldots,w_{2k}\}$ . Consider a symmetric linear interference model $f(a,b)=\gamma a$ .

Fix $\mathrm{T}\in\binom{V(G)}{n}$ and let $\alpha=\alpha(T)=|T\cap V|$ . It is straightforward to verify that

[TABLE]

When $\mathrm{T}$ is chosen uniformly and randomly from $\binom{V(G)}{n}$ , by the CLT, we have that

[TABLE]

as $n\to\infty$ . While $\mathbb{E}_{\mathrm{T}}\xi\to 0$ as $n\to\infty$ , it can be verified using the formulae for higher moments of normal distributions that

[TABLE]

as $n\to\infty$ .

On the other hand, note that any $\mathcal{P}^{*}$ according to (LABEL:eqn:Pstar) consists of a partition of $V$ into pairs and a partition of $W$ into pairs. Therefore, when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}^{*}},$ we have $\alpha=k$ and hence $\xi=-\frac{\gamma}{2}$ .

Of course in the above example, the graph $G$ contains isolated vertices $\{w_{1},\dots,w_{2k}\}$ . The conclusions noted above are qualitatively the same if we add some small number of edges between $\{v_{1},\cdots,v_{2k}\}$ and $\{w_{1},\dots,w_{2k}\}$ , with $d_{\min}\rightarrow\infty$ at a sufficiently slow rate.

Example 6.5 illustrates that our treatment design can improve on the completely random design when there is a high degree of heterogeneity in the degrees of vertices.

6.2 Sparse graphs

For sparse graphs, the bias bounds implied by Theorem 6.3 is a bit weak. In this setting, it is helpful to randomize over all choices of $\mathcal{P}^{*}$ in order to reduce bias. To this end, we introduce the following randomized version of the design introduced in Definition 6.1:

Definition 6.6.

Let $S\subseteq V(G)$ be such that no $r$ vertices in $S$ have the same degree and the number of vertices in $V(G)\setminus S$ of each degree is divisible by $r$ . Let $\mathcal{P}^{**}_{0}$ be sampled uniformly from the set of partitions of $V(G)\setminus S$ into sets of $r$ vertices of the same degree. Let $S=\{w_{1},\ldots,w_{rk}\}$ with

[TABLE]

Let

[TABLE]

Thus the main difference between designs in Definitions 6.1 and 6.6 is that in the latter, we randomize over all vertices with same degree instead of merely fixing a partial ordering. Our MSE bounds rely on the following simple observations:

Lemma 6.7.

For a sequence random variables $X_{1},\ldots,X_{k}$ ,

[TABLE]

Proof.

For all $x,y\geq 0$ , we have $2\sqrt{xy}\leq x+y$ . It follows that

[TABLE]

for all $i,j$ . Thus, we have

[TABLE]

as desired. ∎

The following simple lemma is crucial to the proof of Proposition 7.8 below.

Lemma 6.8.

Let $X_{1},\ldots,X_{n}$ be real valued random variables. Suppose that for each $i,$ there exist at most $\kappa$ indices $j$ such that $X_{i}$ and $X_{j}$ are not independent. Then, we have

[TABLE]

Proof.

Since independent random variables are uncorrelated, for each index $i$ , there exist at most $\kappa$ indices $j$ such that $\operatorname{Corr}(X_{i},X_{j})\not=0$ . It follows that $\sum_{j=1}^{k}|\operatorname{Corr}(X_{i},X_{j})|\leq\kappa$ for all $i$ . The lemma thus follows from Lemma 6.7. ∎

Now give the MSE bounds for $\xi$ :

Proposition 6.9.

If $\|f\|_{\mathbf{d}}\leq 1,$ then

[TABLE]

where $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}^{**}}$ and $\mathcal{P}^{**}$ is as in Definition 6.6.

Proposition 6.9 immediately yields the following MSE bounds for $\widehat{t}_{\mathrm{Neyman}}$ in the sparse regime:

Corollary 6.10.

When $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}^{**}}$ , with $d_{\max}=o(\sqrt{n})$ and $d_{\min}=O(1)$ , the $\mathrm{MSE}$ of $\widehat{t}_{\mathrm{Neyman}}$ is $o(1).$

7 Interference with types.

In this section we define a generalization of the symmetric interference model discussed in Section 4 and derive the MSE bounds for $\widehat{t}_{\mathrm{Neyman}}$ under this model.

Definition 7.1.

The function $f_{v}$ is a symmetric interference model with types if there exists a partition $\Pi$ of $V(G)$ into sets of even sizes such that there are functions $(f_{\pi})_{\pi\in\Pi}$ with

[TABLE]

for all $v\in V(G)$ . Here, $f_{\pi}$ is real-valued with domain

[TABLE]

Remark 7.2.

The case of $\Pi=\{V(G)\}$ recovers the symmetric interference model (without types) in Definition 4.1.

Let $\Delta^{0}(\mathcal{B}_{\pi})$ denote the space of finite, signed measures on $\mathcal{B}_{\pi}$ of total mass 0. When $\pi\subseteq V(G)$ is such that $|\mathrm{T}\cap\pi|=\frac{p|\pi|}{r},$ let

[TABLE]

7.1 Perfect quasi-colorings for interference with types.

The structure of perfect quasi-colorings extends to the setting of interference models with types. For this subsection, we assume that $p=q=1,$ so that the target treatment fraction is $\frac{1}{2}$ . The analogue of Definition 5.1 is:

Definition 7.3.

A perfect quasi-coloring of $G$ with respect to the type partition $\Pi$ is a set $B\in\binom{V(G)}{n}$ that satisfies $D^{\pi}_{B}=0$ and $|B\cap\pi|=|\pi|/2$ for all $\pi\in\Pi$ .

Definition 7.3 recovers Definition 5.1 by taking $\Pi=\{V(G)\}$ . The analogue of Proposition 5.2 is:

Proposition 7.4.

Let $B\in\binom{V(G)}{n}$ be such that $|B\cap\pi|=|\pi|/2$ for all $\pi\in\Pi$ . The following are equivalent in a symmetric model.

•

$B$ * is a perfect quasi-coloring.*

•

$V(G)\setminus B$ * is a perfect quasi-coloring.*

•

$\xi=0$ * for all $(f_{\pi})_{\pi\in\Pi}$ with treatment $T=B$ .*

•

$\xi=0$ * for all $(f_{\pi})_{\pi\in\Pi}$ with treatment $T=V(G)\setminus B$ .*

•

$\xi=0$ * for all $(f_{\pi})_{\pi\in\Pi}$ with treatment chosen uniformly and randomly between $B$ and $V(G)\setminus B$ .*

The proof of Proposition 7.4 is similar to the proof of Proposition 5.2, as

[TABLE]

Example 5.5 shows that perfect quasi-colorings need not exist in general, while the following example generalizes Example 5.4 to exhibit a class of graphs and type partitions in which perfect quasi-colorings exist.

Example 7.5 (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).

Let $H$ be an arbitrary graph with $|V(H)|>1$ , and let $\Pi_{0}$ be a partition of the vertices of $H$ . Let $G=H\times\{0,1\}^{V(H)}$ , and define a partition $\Pi$ of $V(G)$ by

[TABLE]

It is straightforward to verify that

[TABLE]

is a perfect quasi-coloring of $G$ with respect to the type partition $\Pi$ .

7.2 Semi-restricted randomization

For each $\pi\in\Pi,$ let $\mathrm{T}_{\pi}$ be drawn uniformly and randomly from $\binom{\pi}{p|\pi|/r},$ with $(\mathrm{T}_{\pi})_{\pi\in\Pi}$ independent. Define $T_{\Pi}=\bigcup_{\pi\in\Pi}\mathrm{T}_{\pi}.$

We can represent this treatment group in terms of a restricted randomization treatment group as follows. Let $\mathcal{P}$ be sampled uniformly from $\binom{\Pi}{r,\ldots,r}$ (independently of $\vec{B}$ ). Then, $\mathrm{T}_{\vec{B},\mathcal{P}}$ has the same distribution as $\mathrm{T}_{\Pi}$ .

7.3 MSE bounds

In this section, we will use the following metric $\mathbf{d}$ :

Definition 7.6.

Fix $K>0$ , define the metric $\mathbf{d}$ on $\mathcal{B}_{\pi}$ : for all $(a,b),(c,d)\in\mathcal{B}_{\pi}$ ,

[TABLE]

where $d_{\max}$ is as in (LABEL:eqn:dmin).

The analogue of Proposition 5.9 in this setting is:

Proposition 7.7.

For all $\mathcal{P}\in\binom{\Pi}{r,\ldots,r},$ we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ . If $\|f_{\pi}\|_{\mathbf{d}}\leq 1$ for all $\pi\in\Pi,$ then

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}},$ hence also when $\mathrm{T}=\mathrm{T}_{\Pi}$ .

The analogue of Proposition 6.9 is:

Proposition 7.8.

If $\|f_{\pi}\|_{\mathbf{d}}\leq 1$ for all $\pi\in\Pi,$ then

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\Pi}$ .

Thus, in the sparse setting, it is important that $\Pi$ is not too large, i.e., that there are not too many different types of vertices. The condition that $|\Pi|$ not be too large is analogous to the condition that $d_{\max}-d_{\min}$ not be too large implicit in the statement of Proposition 6.9.

8 Homophily and types.

In this section, we directly bound the MSE of $\widehat{t}_{\mathrm{Neyman}}$ in a model that allows homophily between vertices in a single element of $\Pi$ .

For $\pi\in\Pi,$ let

[TABLE]

be the average covariate effect and average treatment effect respectively within a type. If homophily is suspected, one expects that $x_{v}$ will be close to $x_{\pi}$ and $t_{v}$ will be close to $t_{\pi}$ within a type. To that end for $v\in V(G),$ let

[TABLE]

be the discrepancy between an individual node’s behavior and their type average. Then

[TABLE]

captures the sum of squared differences between nodes and their type averages within a graph. Thus $\sigma^{2}$ has an inverse relationship with homophily. The following result, which generalizes Lemma 2.1, bounds the MSE of $t_{\mathrm{ideal}}$ .

Proposition 8.1.

For all partitions $\Pi$ of $V(G)$ into sets of size divisible by $r$ , we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\Pi}$ .

Coupling Proposition 8.1 with bounds on $\xi$ yields the following bias and MSE bounds for $\widehat{t}_{\mathrm{Neyman}}$ when there is homophily.

Corollary 8.2.

If $\|f_{\pi}\|_{\mathbf{d}}\leq 1$ for all $\pi\in\Pi,$ then

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\Pi}$ .

Proof.

Follows from Propositions 7.7, 7.8, and 8.1. ∎

The results of this section are closely related to the work of Basse and Airoldi [3] on optimal design with network correlated outcomes that are induced by homophily but no interference.

9 Simulations.

In this section we conduct a series of simulations to demonstrate the efficacy of the approach. We vary the following parameters: the type of the network and the strength of the interference. For each of the simulations we consider the following model:

[TABLE]

where $x_{v}\overset{iid}{\sim}\mathrm{N}(0,1)$ and $t_{v}\overset{iid}{\sim}\mathrm{N}(2,0.25)$ . That is, the baseline outcome for all of the individuals in the graph is centered at 0 with a variance of 1, while the treatment effect for everyone is centered at 2 with a variance of 0.25. We consider two treatment regimes: our approach (described in Section 6) and the completely randomized design where exactly half of all units are treated randomly. We report log mean squared errors (log MSE) for the Neymanian estimator in Figure 2 for the two sets of simulations. The MSEs are calculated over 10000 simulated randomizations for each approach.

9.1 Erdos-Renyi graphs.

In this simulation we generate a graph $G\sim\text{ER}(N,p)$ with $N$ nodes and overall density $p$ . This is an independent edge random graph model where an edge between node $v$ and $v^{\prime}$ exists with probability $p$ . We consider the symmetric linear interference function $f_{v}(A)=\gamma|A|$ . We consider three graph sizes: $100$ , $200$ and four graph densities: $0.05$ , $0.1$ , $0.5$ . The parameter $\gamma$ varies from $0.1$ to $2$ .

An important quality of Erdos-Renyi graphs is that they are extremely dense (expected degree is $Np$ ) and the degrees of their nodes concentrate [14]. Further, this model gives rise to large cliques within the graph implying that many nodes have the exact same degree and are connected to each other [4]. Because of these traits a randomization scheme based on the degree distribution of the graph is unlikely to perform well. In fact, our proposed procedure behaves similarly to the standard Bernoulli randomization scheme. This behavior is evident in Figure 2(a) where both estimators have approximately the same log MSE with the CRD even exhibiting better behavior for denser graphs and higher levels of interference (such as $p=0.5,\ \gamma=2$ ).

9.2 Preferential attachment graphs.

In this simulation we generate a graph $G\sim\mathrm{PA}(N,\mathrm{pow},m)$ with $N$ nodes, $\mathrm{pow}$ power of the preferential attachment (PA) and $m$ new edges at each step of the graph growth [2]. These graphs are constructed by staring with a single vertex and adding 1 new vertex at a time. The new vertex forms an edge with an existing vertex $v$ with probability $d(v)^{\mathrm{pow}}$ . Each new vertex forms $m$ new edges. This process continues until there are $N$ vertices in the graph. These graphs have power law degree distributions and hence are sparse with many small degree nodes and a few large hubs.

It is clear that log MSE increases with power since it produces denser graphs that are more likely to have too many nodes with the same degree. However, the behavior with respect to $m$ is more complicated. In Figure 2(b) we see the log MSE of the estimator based on CRD increase in $m$ for all levels of $\mathrm{pow}$ . However, this is not the case for the restricted randomization. This behavior is likely explained by the special behavior of super-linear preferential attachment [13]. When $\mathrm{pow}=4$ and $m=4$ most graphs have four central nodes that are connected to everyone else. As such, only these central nodes induce any form of interference on the other nodes and so the restricted randomization ideally allocated treatment. The CRD does not take this structure into account and so frequently is likely to allocate all of the central nodes to treatment or control, leading to increased bias and variance. When $m=6$ there are enough perturbations in the system to lead to poorer performance by the restricted randomization. On the other hand, when $\mathrm{pow}\in(1,2]$ , small $m$ frequently lead to the creation of an odd number of central nodes while a large $m$ produces a large amount of heterogeneity in the degrees. In this setting, the restricted randomization approach prefers more heterogeneity as it balances the interference among nodes. In all of these settings, the CRD performs worse than the restricted randomization.

10 Discussion.

This article provided a new approach to bounding the bias and mean squared error of the Neymanian estimator of the average treatment effect under interference and homophily. It introduced the notion of quasi-coloring to better understand the balance needed in the randomization scheme to account for interference. Based on this construct we developed a restricted randomization scheme that has good theoretical properties and performs well in simulations. There are a number of directions for future research.

The general notion of perfect quasi-coloring provides an intuition for constructing other linear unbiased estimators. For example, we can construct a partial-perfect-quasi-coloring by only treating one node. This produces the following unbiased estimator: $Y_{\mathrm{treated}}-\bar{Y}_{c}$ , where $\bar{Y}_{c}$ is the average outcome of all the control units who are not neighbors of the treated unit. The weights associated with treated and control units are still interpretable.

It is also possible to develop the machinery in this paper for other estimands and estimators of interest. However, this requires even greater care. For example, we could be interested in the interference effect of exactly one treated neighbor — this lends itself naturally to specifying several naive Neyman-type estimators: only consider control (treated) nodes who have one treated neighbor versus control (treated) nodes who have no treated neighbors, or some combination of both. In turn, this suggests particular restrictions on the randomization scheme. More general versions of this approach can be studied for less constrained types of interference.

Acknowledgements.

The authors thank Edoardo Airoldi, Dean Eckles, Vishesh Karwa and Daniel Sussman for helpful conversations. Part of this research was conducted while RJ was an Economic Design Fellow at the Harvard Center of Mathematical Sciences and Applications. NSP was partially supported by an ONR grant. AV was partially supported by a NSF MSPRF.

Appendix A Bounds on bias

For $v\in V(G)$ and $\mathrm{T}\subseteq\binom{V(G)}{n},$ let

[TABLE]

denote the interference effect on $v$ , so that $\xi$ in (LABEL:eqn:xi) can be expressed as

[TABLE]

Given $v,w\in V(G)$ , define the weight of $w$ on $v$ as

[TABLE]

Lemma A.1.

*For a partition $\mathcal{P}=\{S_{1},\ldots,S_{n}\}\in\binom{V(G)}{r,\ldots,r}$ and the treatment assignment mechanism $\mathrm{T}_{\vec{B},\mathcal{P}}$ given in (LABEL:eqn:tbp), we have *

[TABLE]

The full generality of Lemma A.1 may be of use in a weighted interference model, as the formalism of weights allows one to capture the fact that different connections may have different strengths. Including a weak connection (low weight edge) in $\mathcal{P}$ will affect the bias less than including a strong connection. The following result will imply Lemma A.1.

Lemma A.2.

For all $j=1,\ldots,n$ and $v\in S_{j},$ we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}.$

Proof.

Without loss of generality, assume that $j=1$ and $v=w^{1}_{1}$ . When $1\in B_{1},$ define a random variable $B^{\prime}_{1}$ with values in $\binom{[r]}{p}$ by choosing $B^{\prime}_{1}$ uniformly from

[TABLE]

Let $B^{\prime}_{i}=B_{i}$ for $i\not=1.$ Denote by $\xi^{\prime}$ (resp. $\xi^{\prime}_{v}$ ) the interference effect $\xi$ (resp. the interference effect $\xi^{\prime}_{v}$ on $v$ ) for the treatment group $\mathrm{T}^{\prime}=\mathrm{T}_{\vec{B}^{\prime},\mathcal{P}}$ .

When $1\in B_{1},$ we have

[TABLE]

Thus, when $1\in B_{1},$ we have

[TABLE]

where $\mathrm{T}\Delta\mathrm{T}^{\prime}=\{v,w\}.$ Taking expectations with respect to $B^{\prime}_{1}$ , it follows that, when $1\in B_{1},$ we have

[TABLE]

By the triangle inequality, we have

[TABLE]

Note that $\mathcal{L}(B_{1}\mid 1\notin B_{1})=\mathcal{L}(B^{\prime}_{1}\mid 1\in B_{1}),$ where $\mathcal{L}$ denotes the law of a random variable. It follows that

[TABLE]

Combining (LABEL:eqn:xixip) and (LABEL:eqn:xisym) and using the fact that

[TABLE]

we obtain the lemma. ∎

Proof of Lemma A.1.

It follows from Lemma A.2 that

[TABLE]

Summing over $v\in V(G),$ we have

[TABLE]

as desired. ∎

Proof of Lemma 3.3.

From (A) and (LABEL:eqn:lip), it follows that

[TABLE]

with $W_{v}(w)=0$ if $\{v,w\}\notin E(G)$ . The lemma therefore follows from Lemma A.1. ∎

Proof of Proposition 3.6.

By the linearity of expectation, we have

[TABLE]

The proposition follows, by Lemma 3.3. ∎

Proof of Proposition C.1.

By the linearity of expectation, we have

[TABLE]

The proposition follows, by Lemma 3.3. ∎

Appendix B Bounds on MSE: dense case

The following $L^{2}$ bound is the key to the proofs of all of the MSE bounds.

Lemma B.1.

For all $\mathcal{P}=(S_{1},\ldots,S_{n})\in\binom{V(G)}{r,\ldots,r}$ and all $v,v^{\prime}\in S_{j},$ we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}.$

Proof.

Note that

[TABLE]

where

[TABLE]

Thus, it suffices to prove that

[TABLE]

For $w\in V(G),$ let

[TABLE]

It is clear that

[TABLE]

For all $i,$ let

[TABLE]

Note that $F_{i}=\sum_{w\in S_{i}}F_{w}$ and that

[TABLE]

for $w\not=v,v^{\prime}$ . When $i\not=j$ , we also have

[TABLE]

for $w,w^{\prime}\in S_{i}$ . Thus, for $i\not=j,$ we have

[TABLE]

Similarly, we have

[TABLE]

for $w,w^{\prime}\in S_{j}\setminus\{v,v^{\prime}\},$ so that

[TABLE]

Since $B_{1},\ldots,B_{n}$ are independent (even conditioned on $v\in\mathrm{T}$ and $v^{\prime}\notin\mathrm{T}$ ), it follows that

[TABLE]

so that

[TABLE]

Noting that $F=\sum_{i}F_{i}$ and using the fact that $|F_{v}|\leq\frac{q\cdot\mathrm{1}_{E(G)}(\{v,v^{\prime}\})}{d(v)}$ and $|F_{v^{\prime}}|\leq\frac{p\cdot\mathrm{1}_{E(G)}(\{v,v^{\prime}\})}{d(v^{\prime})},$ it follows that

[TABLE]

and the proof is finished. ∎

For $v\in S_{i},$ define

[TABLE]

Note that

[TABLE]

Proof of Proposition 5.9.

We have

[TABLE]

By Lemma B.1, it follows that

[TABLE]

Noting that $D_{\mathrm{T}}=\frac{1}{pqn}\sum_{v\in V(G)}D^{v}_{\mathrm{T}},$ we have

[TABLE]

as claimed. ∎

Proof of Proposition 7.7.

As in the proof of Proposition 5.9, we have

[TABLE]

By Lemma B.1, it follows that

[TABLE]

Noting that $D^{\pi}_{\mathrm{T}}=\frac{1}{pqn}\sum_{v\in\pi}D^{v}_{\mathrm{T}},$ we have

[TABLE]

Summing over $\pi\in\Pi,$ it follows that

[TABLE]

as claimed. ∎

Appendix C Bounds on MSE: sparse case

As Proposition 3.6 shows, introducing randomness can help reduce bias. We will first need a generalization of Proposition 3.6 to a class of semi-restricted randomizations.

Given a partition $\Pi$ of $V(G)$ , let $\binom{\Pi}{r,\ldots,r}$ denote the set of partitions $\mathfrak{P}=(S_{1},\ldots,S_{k})$ of $V(G)$ into pairs such that $S_{i}$ lies in an element of $\Pi$ for every $i$ . That is, $\binom{\Pi}{r,\ldots,r}$ is the set of partitions of $V(G)$ into sets of size $r$ that refine $\Pi$ . Assume that the function $f_{v}$ is $K_{v}$ -Lipshitz and define the quantity

[TABLE]

Proposition C.1.

Fix a partition $\Pi$ of $V(G)$ into sets of size divisible by $r$ . When $\mathcal{P}$ is sampled uniformly from $\binom{\Pi}{r,\ldots,r}$ , we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}.$

C.1 With types

The following lemma will be used in the proof of Proposition 7.8. Recall $K_{\max}$ and $d_{\min}$ from Equations (LABEL:eqn:kmax) and (LABEL:eqn:dmin) respectively.

Lemma C.2.

When $\mathcal{P}$ is sampled uniformly from $\binom{\Pi}{r,\ldots,r}$ , we have

[TABLE]

when $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ .

Proof.

The proof of Proposition C.1 shows that

[TABLE]

We have

[TABLE]

The lemma follows, by Lemma 3.3. ∎

Proof of Proposition 7.8.

Let $\mathcal{P}=(S_{1},\ldots,S_{n})$ be sampled uniformly from $\binom{\Pi}{r,\ldots,r}$ and let $\mathrm{T}=\mathrm{T}_{\vec{B},\mathcal{P}}$ . For $i=1,2,\ldots,n,$ define

[TABLE]

Note that if $\xi_{i}$ and $\xi_{j}$ are dependent given $\mathcal{P}$ and $i\not=j$ , then either there is an edge between $S_{i}$ and $S_{j}$ or there exists $k$ such that there are edges between $S_{i}$ and $S_{k}$ and between $S_{k}$ and $S_{j}$ . In particular, for fixed $i,$ there are at most $r^{2}d_{\max}^{2}+1$ values of $j$ such that $\xi_{i}$ and $\xi_{j}$ are dependent given $\mathcal{P}$ . By Lemmata B.1 and 6.8, it follows that

[TABLE]

By (B.1) in the proof of Proposition 7.7

[TABLE]

Taking square-roots yields that

[TABLE]

as desired.

The bound on $\mathbb{E}_{T}\xi$ is given by Proposition C.1. It remains to prove the bound on $\mathbb{E}_{\mathrm{T}}\xi^{2}$ . Eve’s Law, Lemma C.2, and the previous paragraph together imply that

[TABLE]

as desired. ∎

C.2 Without types

The key to the proof of Proposition 6.9 is to note that the treatment group $T_{\vec{B},\mathcal{P}^{**}}$ has the same distribution as the treatment group $T_{\Pi}$ for a suitably chosen $\Pi$ .

Proof of Proposition 6.9.

Let $D=d(V(G)\setminus S)$ . For $d\in D,$ let $V_{d}=\{v\in V(G)\setminus S\mid d(v)=d\}.$ Define

[TABLE]

For $d\in D,$ define $g_{V_{d}}=\left.f\right|_{V_{d}}$ . For $1\leq i\leq k$ , define

[TABLE]

It is straightforward to verify that $f$ and $g_{\Pi(v)}$ agree on $\{(a,b)\in\mathbb{Z}_{\geq 0}^{2}\mid a+b=d(v)\}$ for all $v\notin S$ and

[TABLE]

for all $a+b=d(u)$ . Define

[TABLE]

and let

[TABLE]

The discussion of the previous paragraph shows that

[TABLE]

The proposition then follows by bounding $\zeta$ for the treatment $\mathrm{T}=\mathrm{T}_{\Pi}$ using Proposition 7.8. ∎

Appendix D Homophily

Proof of Proposition 8.1.

The first assertion follows from Lemma 2.1 because $\mathbb{P}(v\in\mathrm{T})=\frac{p}{r}$ for all $v\in V(G)$ .

As $|\mathrm{T}\cap\pi|=\frac{p|\pi|}{r}$ for all $\pi\in\Pi,$ we have

[TABLE]

Note that $\operatorname{Corr}\left(\chi^{\mathrm{T}}_{v},\chi^{\mathrm{T}}_{w}\right)=-\frac{1}{r-1}$ if $v\not=w$ lie in a single part of $\Pi$ and $\operatorname{Corr}\left(\chi^{\mathrm{T}}_{v},\chi^{\mathrm{T}}_{w}\right)=0$ if $v$ and $w$ lie in different parts of $\Pi$ . Thus, we have

[TABLE]

for all $v\in V(G)$ . By Lemma 6.7, it follows that

[TABLE]

as desired. ∎

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aiello et al. [2016] {barticle} [author] \bauthor \bsnm Aiello, \bfnm Allison E \binits A. E., \bauthor \bsnm Simanek, \bfnm Amanda M \binits A. M., \bauthor \bsnm Eisenberg, \bfnm Marisa C \binits M. C., \bauthor \bsnm Walsh, \bfnm Alison R \binits A. R., \bauthor \bsnm Davis, \bfnm Brian \binits B., \bauthor \bsnm Volz, \bfnm Erik \binits E., \bauthor \bsnm Cheng, \bfnm Caroline \binits C., \bauthor \bsnm Rainey, \bfnm Jeanette J \binits J. J., \bauthor \bsnm Uzicanin, \b
2Barabási and Albert [1999] {barticle} [author] \bauthor \bsnm Barabási, \bfnm Albert-László \binits A.-L. and \bauthor \bsnm Albert, \bfnm Réka \binits R. ( \byear 1999). \btitle Emergence of scaling in random networks. \bjournal Science \bvolume 286 \bpages 509–512. \endbibitem
3Basse and Airoldi [2015] {barticle} [author] \bauthor \bsnm Basse, \bfnm Guillaume W \binits G. W. and \bauthor \bsnm Airoldi, \bfnm Edoardo M \binits E. M. ( \byear 2015). \btitle Optimal design of experiments in the presence of network-correlated outcomes. \bjournal Ar Xiv e-prints. \endbibitem
4Bollobás and Erdös [1976] {binproceedings} [author] \bauthor \bsnm Bollobás, \bfnm Béla \binits B. and \bauthor \bsnm Erdös, \bfnm Paul \binits P. ( \byear 1976). \btitle Cliques in random graphs. In \bbooktitle Mathematical Proceedings of the Cambridge Philosophical Society \bvolume 80 \bpages 419–427. \bpublisher Cambridge Univ Press. \endbibitem
5Choi [2016] {barticle} [author] \bauthor \bsnm Choi, \bfnm David \binits D. ( \byear 2016). \btitle Estimation of monotone treatment effects in network experiments. \bjournal Journal of the American Statistical Association \banumber just-accepted. \endbibitem
6Eckles, Karrer and Ugander [2017] {barticle} [author] \bauthor \bsnm Eckles, \bfnm Dean \binits D., \bauthor \bsnm Karrer, \bfnm Brian \binits B. and \bauthor \bsnm Ugander, \bfnm Johan \binits J. ( \byear 2017). \btitle Design and Analysis of Experiments in Networks: Reducing Bias from Interference. \bjournal Journal of Causal Inference \bvolume 5. \endbibitem
7Gruhl et al. [2004] {binproceedings} [author] \bauthor \bsnm Gruhl, \bfnm Daniel \binits D., \bauthor \bsnm Guha, \bfnm Ramanathan \binits R., \bauthor \bsnm Liben-Nowell, \bfnm David \binits D. and \bauthor \bsnm Tomkins, \bfnm Andrew \binits A. ( \byear 2004). \btitle Information diffusion through blogspace. In \bbooktitle Proceedings of the 13th International Conference on World Wide Web \bpages 491–501. \bpublisher ACM. \endbibitem
8Hoff, Raftery and Handcock [2002] {barticle} [author] \bauthor \bsnm Hoff, \bfnm Peter D. \binits P. D., \bauthor \bsnm Raftery, \bfnm Adrian E. \binits A. E. and \bauthor \bsnm Handcock, \bfnm Mark S. \binits M. S. ( \byear 2002). \btitle Latent space approaches to social network analysis. \bjournal Journal of the American Statistical Association \bvolume 97 \bpages 1090–1098. \endbibitem

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Designs for estimating the treatment effect in Networks with

Abstract

keywords:

1 Introduction.

1.1 Background and literature.

1.2 Notation.

1.3 Paper guide.

2 The Model and the Estimator.

Lemma 2.1**.**

Proof.

3 Restricted Randomizations.

3.1 General upper bound on bias.

Definition 3.1**.**

Example 3.2**.**

Lemma 3.3**.**

Corollary 3.4**.**

Proof.

3.2 Random choices of P\mathcal{P}P.

Example 3.5**.**

Proposition 3.6**.**

Corollary 3.7**.**

4 Symmetric interference model.

Definition 4.1**.**

Example 4.2**.**

Example 4.3**.**

5 Perfect quasi-colorings and designs for symmetric interference model.

Definition 5.1**.**

Proposition 5.2**.**

Remark 5.3**.**

Proof.

Example 5.4** (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).**

Example 5.5** (A hexagon does not have a perfect quasi-coloring).**

5.1 Quantifying the notion of perfect quasi-coloring.

Definition 5.6**.**

Remark 5.7**.**

Remark 5.8**.**

Proposition 5.9**.**

Corollary 5.10**.**

Remark 5.11**.**

6 New Designs and MSE for t^Neyman\widehat{t}_{\mathrm{Neyman}}tNeyman​.

6.1 Dense Graphs.

Definition 6.1**.**

Lemma 6.2**.**

Proof.

Theorem 6.3**.**

Corollary 6.4**.**

Example 6.5**.**

6.2 Sparse graphs

Definition 6.6**.**

Lemma 6.7**.**

Proof.

Lemma 6.8**.**

Proof.

Proposition 6.9**.**

Corollary 6.10**.**

7 Interference with types.

Definition 7.1**.**

Remark 7.2**.**

7.1 Perfect quasi-colorings for interference with types.

Definition 7.3**.**

Proposition 7.4**.**

Example 7.5** (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).**

7.2 Semi-restricted randomization

7.3 MSE bounds

Definition 7.6**.**

Proposition 7.7**.**

Proposition 7.8**.**

8 Homophily and types.

Proposition 8.1**.**

Corollary 8.2**.**

Proof.

9 Simulations.

9.1 Erdos-Renyi graphs.

9.2 Preferential attachment graphs.

Lemma 2.1.

Definition 3.1.

Example 3.2.

Lemma 3.3.

Corollary 3.4.

3.2 Random choices of $\mathcal{P}$ .

Example 3.5.

Proposition 3.6.

Corollary 3.7.

Definition 4.1.

Example 4.2.

Example 4.3.

Definition 5.1.

Proposition 5.2.

Remark 5.3.

Example 5.4 (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).

Example 5.5 (A hexagon does not have a perfect quasi-coloring).

Definition 5.6.

Remark 5.7.

Remark 5.8.

Proposition 5.9.

Corollary 5.10.

Remark 5.11.

6 New Designs and MSE for $\widehat{t}_{\mathrm{Neyman}}$ .

Definition 6.1.

Lemma 6.2.

Theorem 6.3.

Corollary 6.4.

Example 6.5.

Definition 6.6.

Lemma 6.7.

Lemma 6.8.

Proposition 6.9.

Corollary 6.10.

Definition 7.1.

Remark 7.2.

Definition 7.3.

Proposition 7.4.

Example 7.5 (Perfect quasi-colorings exist in the graph consisting of copies of a smaller graph).

Definition 7.6.

Proposition 7.7.

Proposition 7.8.

Proposition 8.1.

Corollary 8.2.

Lemma A.1.

Lemma A.2.

Lemma B.1.

Proposition C.1.

Lemma C.2.