Respondent driven sampling and sparse graph convergence

Siva Athreya; Adrian R\"ollin

arXiv:1705.02731·math.PR·May 9, 2017

Respondent driven sampling and sparse graph convergence

Siva Athreya, Adrian R\"ollin

PDF

TL;DR

This paper studies a respondent-driven sampling method modeled by a graphon, demonstrating that under certain conditions, the resulting sparse graphs converge to the graphon using advanced probabilistic tools.

Contribution

It introduces a novel approach to analyze respondent-driven sampling via graphon convergence and develops a specific clumping procedure for sparse graph construction.

Findings

01

Sparse graphs constructed via the method converge to the graphon in the cut-metric.

02

Stationarity of the vertex-sets is key for convergence.

03

Uses concentration inequalities and Stein-Chen method for analysis.

Abstract

We consider a particular respondent-driven sampling procedure governed by a graphon. By a specific clumping procedure of the sampled vertices we construct a sequence of sparse graphs. If the sequence of the vertex-sets is stationary then the sequence of sparse graphs converge to the governing graphon in the cut-metric. The tools used are concentration inequality for Markov chains and the Stein-Chen method.

Equations116

\|\kappa\|_{\square}\vcentcolon=\sup_{S,T\subseteq[0,1]}\biggl{\lvert}\int_{S\times T}\kappa(x,y)dxdy\biggr{\rvert},

\|\kappa\|_{\square}\vcentcolon=\sup_{S,T\subseteq[0,1]}\biggl{\lvert}\int_{S\times T}\kappa(x,y)dxdy\biggr{\rvert},

∥ κ ∥_{1} : = \int_{[0, 1] \times [0, 1]} ∣ κ (x, y)∣ d x d y .

∥ κ ∥_{1} : = \int_{[0, 1] \times [0, 1]} ∣ κ (x, y)∣ d x d y .

d_{□} (κ_{1}, κ_{2}) : = ∥ κ_{1} - κ_{2} ∥_{□}, d_{1} (κ_{1}, κ_{2}) : = ∥ κ_{1} - κ_{2} ∥_{1} .

d_{□} (κ_{1}, κ_{2}) : = ∥ κ_{1} - κ_{2} ∥_{□}, d_{1} (κ_{1}, κ_{2}) : = ∥ κ_{1} - κ_{2} ∥_{1} .

δ_{□} (κ_{1}, κ_{2}) : = σ in f d_{□} (κ_{1}^{σ}, κ_{2}),

δ_{□} (κ_{1}, κ_{2}) : = σ in f d_{□} (κ_{1}^{σ}, κ_{2}),

δ_{□} (G, G^{'}) : = δ_{□} (κ_{G}, κ_{G^{'}}) .

δ_{□} (G, G^{'}) : = δ_{□} (κ_{G}, κ_{G^{'}}) .

d_{\square}(G,G^{\prime})=\max_{S,T\subseteq V(G)}\biggl{\lvert}\sum_{i\in S,j\in T}\beta_{ij}(G)-\beta_{ij}(G^{\prime})\biggr{\rvert}.

d_{\square}(G,G^{\prime})=\max_{S,T\subseteq V(G)}\biggl{\lvert}\sum_{i\in S,j\in T}\beta_{ij}(G)-\beta_{ij}(G^{\prime})\biggr{\rvert}.

d_{□} (G, κ) : = d_{□} (κ_{G}, κ), δ_{□} (G, κ) : = δ_{□} (κ_{G}, κ) .

d_{□} (G, κ) : = d_{□} (κ_{G}, κ), δ_{□} (G, κ) : = δ_{□} (κ_{G}, κ) .

d_{\square}\bigl{(}G_{n}/\rho_{n},\kappa\bigr{)}\to 0\qquad\text{and}\qquad\delta_{\square}\bigl{(}G_{n}/\|{G_{n}}\|_{1},\kappa/\|\kappa\|_{1}\bigr{)}\to 0

d_{\square}\bigl{(}G_{n}/\rho_{n},\kappa\bigr{)}\to 0\qquad\text{and}\qquad\delta_{\square}\bigl{(}G_{n}/\|{G_{n}}\|_{1},\kappa/\|\kappa\|_{1}\bigr{)}\to 0

\mathbbm P [X_{m + 1} \in d y ∣ X_{m} = x] = \frac{κ ( x , y )}{\int _{0}^{1} κ ( z , x ) d z} d y .

\mathbbm P [X_{m + 1} \in d y ∣ X_{m} = x] = \frac{κ ( x , y )}{\int _{0}^{1} κ ( z , x ) d z} d y .

π (d x) = \frac{\int _{0}^{1} κ ( x , u ) d u}{\int _{0}^{1} \int _{0}^{1} κ ( u , v ) d u d v} d x .

π (d x) = \frac{\int _{0}^{1} κ ( x , u ) d u}{\int _{0}^{1} \int _{0}^{1} κ ( u , v ) d u d v} d x .

\mathbbm P [X_{m} \in d x, X_{m + 1} \in d y] = π (d x) \frac{κ ( x , y )}{\int _{0}^{1} κ ( x , z ) d z} d y = \frac{κ ( x , y )}{\int _{0}^{1} \int _{0}^{1} κ ( u , v ) d u d v} d x d y .

\mathbbm P [X_{m} \in d x, X_{m + 1} \in d y] = π (d x) \frac{κ ( x , y )}{\int _{0}^{1} κ ( x , z ) d z} d y = \frac{κ ( x , y )}{\int _{0}^{1} \int _{0}^{1} κ ( u , v ) d u d v} d x d y .

I_{n}(i,j)=\begin{cases}1&\begin{minipage}[c]{260.17464pt}if there exists $0\leqslant m<N$ such that either~{}$X_{m}\in A_{n,i}$ and~{}$X_{m+1}\in A_{n,j}$, or $X_{m}\in A_{n,j}$ and $X_{m+1}\in A_{n,i}$,\end{minipage}\\[8.61108pt] 0&\text{otherwise.}\end{cases}

I_{n}(i,j)=\begin{cases}1&\begin{minipage}[c]{260.17464pt}if there exists $0\leqslant m<N$ such that either~{}$X_{m}\in A_{n,i}$ and~{}$X_{m+1}\in A_{n,j}$, or $X_{m}\in A_{n,j}$ and $X_{m+1}\in A_{n,i}$,\end{minipage}\\[8.61108pt] 0&\text{otherwise.}\end{cases}

0 < δ ⩽ \frac{κ ( x , y )}{\int _{0}^{1} κ ( x , z ) d z} ⩽ φ (y), 0 ⩽ x, y ⩽ 1.

0 < δ ⩽ \frac{κ ( x , y )}{\int _{0}^{1} κ ( x , z ) d z} ⩽ φ (y), 0 ⩽ x, y ⩽ 1.

n \to \infty lim \frac{N}{n ^{1 + α}} = λ .

n \to \infty lim \frac{N}{n ^{1 + α}} = λ .

\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n}\,\,,\,\,\frac{\kappa}{\|\kappa\|_{1}}\biggr{)}=0

\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n}\,\,,\,\,\frac{\kappa}{\|\kappa\|_{1}}\biggr{)}=0

β_{ij} (\mathbbm E G_{n}) = \frac{1}{2} \mathbbm E I_{n} (i, j) .

β_{ij} (\mathbbm E G_{n}) = \frac{1}{2} \mathbbm E I_{n} (i, j) .

\beta_{ij}(H_{n})=\frac{n^{2}}{2N}\biggl{(}1-\exp\Bigl{(}-\frac{2N}{n^{2}}\mu_{n}(i,j)\Bigr{)}\biggr{)},\quad\text{where }\mu_{n}(i,j)=n^{2}\int_{A_{n,i}}\int_{A_{n,j}}\frac{\kappa(x,y)}{\|\kappa\|_{1}}dxdy.

\beta_{ij}(H_{n})=\frac{n^{2}}{2N}\biggl{(}1-\exp\Bigl{(}-\frac{2N}{n^{2}}\mu_{n}(i,j)\Bigr{)}\biggr{)},\quad\text{where }\mu_{n}(i,j)=n^{2}\int_{A_{n,i}}\int_{A_{n,j}}\frac{\kappa(x,y)}{\|\kappa\|_{1}}dxdy.

\begin{split}\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n},\frac{n^{2}}{N}\mathbbm{E}G_{n}\biggr{)}=0\qquad\text{almost surely w.r.t.\ }\operatorname{\mathbbm{P}}.\end{split}

\begin{split}\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n},\frac{n^{2}}{N}\mathbbm{E}G_{n}\biggr{)}=0\qquad\text{almost surely w.r.t.\ }\operatorname{\mathbbm{P}}.\end{split}

\begin{split}&d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n},\frac{n^{2}}{N}\mathbbm{E}G_{n}\biggr{)}\\ &\qquad=\sup\limits_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\biggl{\lvert}\sum\limits_{i:i/n\in S}\sum\limits_{j:j/n\in T}\frac{n^{2}}{2N}\bigl{(}I_{n}(i,j)-\mathbbm{E}I_{n}(i,j)\bigr{)}\mathop{\mathrm{Vol}}(A_{n,i})\mathop{\mathrm{Vol}}(A_{n,j})\biggr{\rvert}\\ &\qquad=\sup\limits_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\biggl{\lvert}f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\biggr{\rvert},\end{split}

\begin{split}&d_{\square}\biggl{(}\frac{n^{2}}{N}G_{n},\frac{n^{2}}{N}\mathbbm{E}G_{n}\biggr{)}\\ &\qquad=\sup\limits_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\biggl{\lvert}\sum\limits_{i:i/n\in S}\sum\limits_{j:j/n\in T}\frac{n^{2}}{2N}\bigl{(}I_{n}(i,j)-\mathbbm{E}I_{n}(i,j)\bigr{)}\mathop{\mathrm{Vol}}(A_{n,i})\mathop{\mathrm{Vol}}(A_{n,j})\biggr{\rvert}\\ &\qquad=\sup\limits_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\biggl{\lvert}f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\biggr{\rvert},\end{split}

\operatorname{\mathbbm{P}}\bigl{[}\lvert f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\rvert>\varepsilon\bigr{]}\leqslant 2\exp\bigl{(}-NC\varepsilon^{2}\bigr{)},

\operatorname{\mathbbm{P}}\bigl{[}\lvert f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\rvert>\varepsilon\bigr{]}\leqslant 2\exp\bigl{(}-NC\varepsilon^{2}\bigr{)},

\begin{split}\operatorname{\mathbbm{P}}\bigl{[}\sup_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\lvert f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\rvert>\varepsilon\bigr{]}&\leqslant 2^{2n+1}\exp\bigl{(}-NC\varepsilon^{2}\bigr{)}\\ &=\exp\bigl{(}-NC\varepsilon^{2}+(2n+1)\log 2\bigr{)}.\end{split}

\begin{split}\operatorname{\mathbbm{P}}\bigl{[}\sup_{\begin{subarray}{c}S=\bigcup_{m=1}^{k}A_{n,i_{m}},\\ T=\bigcup_{m=1}^{l}A_{n,j_{m}}\end{subarray}}\lvert f_{S,T}(X)-\mathbbm{E}f_{S,T}(X)\rvert>\varepsilon\bigr{]}&\leqslant 2^{2n+1}\exp\bigl{(}-NC\varepsilon^{2}\bigr{)}\\ &=\exp\bigl{(}-NC\varepsilon^{2}+(2n+1)\log 2\bigr{)}.\end{split}

\begin{split}\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\biggr{)}=0.\end{split}

\begin{split}\lim_{n\to\infty}d_{\square}\biggl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\biggr{)}=0.\end{split}

E_{n} (i, j) = m = 1 \sum N I_{m},

E_{n} (i, j) = m = 1 \sum N I_{m},

\mathbbm E E_{n} (i, j) = 2 N \int_{A_{n, i}} \int_{A_{n, j}} \frac{κ ( x , y )}{∥ κ ∥ _{1}} d x d y = \frac{2 N}{n ^{2}} μ_{n} (i, j) .

\mathbbm E E_{n} (i, j) = 2 N \int_{A_{n, i}} \int_{A_{n, j}} \frac{κ ( x , y )}{∥ κ ∥ _{1}} d x d y = \frac{2 N}{n ^{2}} μ_{n} (i, j) .

\begin{split}&d_{\square}\Bigl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\Bigr{)}\leqslant d_{1}\Bigl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\Bigr{)}\\ &\qquad=\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{n^{2}}{2N}\mathop{\mathrm{Vol}}(A_{n,i})\mathop{\mathrm{Vol}}(A_{n,j})\bigl{\lvert}\mathbbm{E}I_{n}(i,j)-\bigl{(}1-\exp(-\mathbbm{E}E_{n}(i,j))\bigr{)}\bigr{\rvert}\\ &\qquad=\frac{1}{2N}\sum_{i=1}^{n}\sum_{j=1}^{n}\bigl{\lvert}\mathbbm{E}I_{n}(i,j)-(1-\exp\bigl{(}-\mathbbm{E}E_{n}(i,j)\bigr{)})\bigr{\rvert}\\ &\qquad=\frac{1}{2N}\sum_{i=1}^{n}\sum_{j=1}^{n}\lvert\operatorname{\mathbbm{P}}[E_{n}(i,j)=0]-\operatorname{\mathbbm{P}}[Z_{n}(i,j)=0]\rvert,\end{split}

\begin{split}&d_{\square}\Bigl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\Bigr{)}\leqslant d_{1}\Bigl{(}\frac{n^{2}}{N}\mathbbm{E}G_{n},H_{n}\Bigr{)}\\ &\qquad=\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{n^{2}}{2N}\mathop{\mathrm{Vol}}(A_{n,i})\mathop{\mathrm{Vol}}(A_{n,j})\bigl{\lvert}\mathbbm{E}I_{n}(i,j)-\bigl{(}1-\exp(-\mathbbm{E}E_{n}(i,j))\bigr{)}\bigr{\rvert}\\ &\qquad=\frac{1}{2N}\sum_{i=1}^{n}\sum_{j=1}^{n}\bigl{\lvert}\mathbbm{E}I_{n}(i,j)-(1-\exp\bigl{(}-\mathbbm{E}E_{n}(i,j)\bigr{)})\bigr{\rvert}\\ &\qquad=\frac{1}{2N}\sum_{i=1}^{n}\sum_{j=1}^{n}\lvert\operatorname{\mathbbm{P}}[E_{n}(i,j)=0]-\operatorname{\mathbbm{P}}[Z_{n}(i,j)=0]\rvert,\end{split}

\mathop{d_{\mathrm{TV}}}\bigl{(}\mathscr{L}(E_{n}(i,j)),\mathop{\mathrm{Poisson}}\bigl{(}{\textstyle\frac{2N}{n^{2}}}\mu_{n}(i,j)\bigr{)}\bigr{)}\leqslant\mathbbm{E}\bigl{\lvert}E_{n}(i,j)-(E^{s}_{n}(i,j)-1)\bigr{\rvert},

\mathop{d_{\mathrm{TV}}}\bigl{(}\mathscr{L}(E_{n}(i,j)),\mathop{\mathrm{Poisson}}\bigl{(}{\textstyle\frac{2N}{n^{2}}}\mu_{n}(i,j)\bigr{)}\bigr{)}\leqslant\mathbbm{E}\bigl{\lvert}E_{n}(i,j)-(E^{s}_{n}(i,j)-1)\bigr{\rvert},

\mathscr{L}(X^{\prime})=\mathscr{L}(X\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muI_{M}=1)=\mathscr{L}\bigl{(}X\mskip 1.0mu plus 0.25mu minus 0.25mu\big{|}\mskip 0.6mu plus 0.15mu minus 0.15mu(X_{M-1},X_{M})\in A_{n,ij}\cup A_{n,ji}\bigr{)}.

\mathscr{L}(X^{\prime})=\mathscr{L}(X\mskip 0.5mu plus 0.25mu|\mskip 0.5mu plus 0.15muI_{M}=1)=\mathscr{L}\bigl{(}X\mskip 1.0mu plus 0.25mu minus 0.25mu\big{|}\mskip 0.6mu plus 0.15mu minus 0.15mu(X_{M-1},X_{M})\in A_{n,ij}\cup A_{n,ji}\bigr{)}.

L ((I_{m}^{'})_{0 ⩽ m ⩽ n}) = L ((I_{m})_{0 ⩽ m ⩽ n} pl u s 0.25 m u ∣ pl u s 0.15 m u I_{M} = 1) .

L ((I_{m}^{'})_{0 ⩽ m ⩽ n}) = L ((I_{m})_{0 ⩽ m ⩽ n} pl u s 0.25 m u ∣ pl u s 0.15 m u I_{M} = 1) .

E_{n}^{s} (i, j) = m = 1 \sum N I_{m}^{'}

E_{n}^{s} (i, j) = m = 1 \sum N I_{m}^{'}

\mathbbm P [X_{m + 1} \in d y ∣ X_{m} = x] ⩾ δ d y .

\mathbbm P [X_{m + 1} \in d y ∣ X_{m} = x] ⩾ δ d y .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

RESPONDENT DRIVEN SAMPLING AND SPARSE GRAPH CONVERGENCE

Siva Athreya1

Adrian Röllin2

Abstract

We consider a particular respondent-driven sampling procedure governed by a graphon. By a specific clumping procedure of the sampled vertices we construct a sequence of sparse graphs. If the sequence of the vertex-sets is stationary then the sequence of sparse graphs converge to the governing graphon in the cut-metric. The tools used are concentration inequality for Markov chains and the Stein-Chen method.

11footnotetext: Indian Statistical Institute, 8th Mile Mysore Road, Bangalore, 560059 India.

Email: [email protected]: Department of Statistics and Applied Probability, National University of Singapore, 6 Science Drive 2, Singapore 117546. Email: [email protected]

2000 Mathematics Subject Classification. Primary 05C80, 60J20; Secondary 37A30, 9482.

Keywords. Respondent Driven Sampling; random graph; sparse graph limits; dense graph limits.

1 Introduction

Respondent Driven Sampling (RDS), popularised by Heckathorn (1997), is a method to sample from hard-to-reach populations, such as drug users, MSM and people with HIV, and it is being routinely used in studies involving such populations. The sampling procedure is subject to various biases, one of which is a bias towards individuals with higher degrees, as these are more likely to appear in the sample.

How this bias affects the network as a whole has been described by Athreya and Röllin (2016) in the context of dense graph limits. The model considered there is defined in terms of a two-step procedure. First, vertices are sampled according to an ergodic process (the important point to note is that the vertices need not be sampled independently of each other). Second, edges between vertices are sampled independently of each other, where the probability of an edge is determined via a graphon representing the underlying network.

Dense graphs are at one extreme of graph sequences. These are graphs on $n$ vertices with the number of edges being of order $n^{2}$ , which is far more than what is observed in real world networks. At the opposite end are sequences of graphs with bounded (average) degree and consequently having order $n$ edges. These have a separate limiting theory which is not quite applicable to many real world networks. There is class of graph sequences between these two extremes, called sparse graphs — these are graphs for which the average degree grows in the number of vertices, but only at sub-linear speed.

The purpose of this note is to extend the work of Athreya and Röllin (2016) to sparse graphs, and to consider more realistic models of sampling. Since RDS data typically comes in the form of trees, the actual graphs are those with average degrees remaining bounded as the number of nodes $n$ grows. We propose a model where “close enough” participants are “clumped” together so that the average degree now grows in $n$ . Our main result is that the random sparse graph sequence obtained through a specific respondent-driven sampling procedure converges almost surely to the graphon underlying the network in the cut-metric, provided the sequence of the vertex-sets is stationary

The method of proof in this article is entirely different from that of Athreya and Röllin (2016). This is mainly due to the fact that, unlike in the dense case, subgraphs counts no longer characterise graph convergence. We compare our random sparse graph sequence with an “expected” (deterministic) sparse graph via a concentration inequality. We then use the Stein-Chen method to compare this deterministic sparse graph to a sequence of graphs which are close to the graphon of the underlying network.

The rest of the article is organised as follows. In Section 2 we provide a brief introduction to sparse graph convergence. In Section 3 we describe our model and state our main result (Theorem 3.1). We present the proof of the main result in Section 4. We then conclude with some remarks in a final discussion section on Respondent Driven Sampling and Dense graph sequences.

Acknowledgements:

Adrian Röllin was supported by NUS Research Grand R-155-000-167-112. Siva Athreya was supported by CPDA grant from the Indian Statistical Institute and an ISF-UGC project grant.

2 Sparse graph convergence

This section is a very brief introduction to sparse graph convergence. The convergence of sparse graphs was initiated by Bollobás and Riordan (2009) and then the $L_{p}$ theory was established in Borgs et al. (2014a) and Borgs et al. (2014b). We present the minimal amount of material necessary to formulate and prove our main result. We first define weighted graphs, followed by definition of graphon and conclude with a brief discussion on a convergence result.

Weighted graphs.

Consider a graph $G$ , given by its set of vertices $V(G)$ and set of edges $E(G)$ . A (edge-)weighted graph $G$ is simply a graph which has, in addition, a weight function $\beta(G)=(\beta_{ij}(G))_{i,j\in V(G)}$ , where, for each $\{i,j\}\in E(G)$ , we interpret the value $\beta_{ij}(G)$ as the weight of that edge. By making the convention that $\beta_{ij}(G)=0$ whenever there is no edge between vertices $i$ and $j$ , the information about $E(G)$ is contained in $\beta(G)$ , so that any weighted graph is determined by $V(G)$ and $\beta(G)$ . Moreover, any unweighted graph can be interpreted as a weighted graph by setting $\beta_{ij}(G)=1$ whenever $\{i,j\}\in E(G)$ .

For any weighted graph $G$ and any constant $c\in\mathbbm{R}$ , we shall define $cG$ to be the weighted graph on the same set of vertices and edge weights $\beta_{ij}(cG)=c\beta_{ij}(G)$ .

Graphons.

A graphon is any symmetric, function $\kappa:[0,1]^{2}\to\mathbbm{R}_{+}$ which is integrable; note that we restrict ourselves to non-negative graphons, whereas Borgs et al. (2014a) allow for more general graphons. For any graphon $\kappa$ , the cut-norm of $\kappa$ is defined as

[TABLE]

where the supremum is taken over Lebesgue-measurable subsets of $[0,1]$ . The $L_{1}$ -norm of $\kappa$ is given by

[TABLE]

For any two graphons $\kappa_{1}$ and $\kappa_{2}$ , we let

[TABLE]

Since a Lebesgue measure preserving transformation of $[0,1]$ will not change the norm of a graphon, it is customary to define the cut-metric $\delta_{\square}$ on graphons by

[TABLE]

where the infimum ranges over all measure-preserving bijections $\sigma:[0,1]\to[0,1]$ , and where the graphon $\kappa^{\sigma}$ is defined as $\kappa^{\sigma}(x,y)=\kappa(\sigma(x),\sigma(y))$ .

Every weighted graph $G$ is naturally associated with a graphon $\kappa_{G}$ in the following way. First, divide the interval $[0,1]$ into intervals $I_{1},\dots,I_{\lvert V(G)\rvert}$ of lengths $1/\lvert V(G)\rvert$ for each $i\in V(G)$ . The function $\kappa_{G}$ is then given the constant value $\beta_{ij}(G)$ on $I_{i}\times I_{j}$ for every $i,j\in V(G)$ . It is easily verified that $\kappa_{G}$ is indeed a graphon.

Thus, even if $G$ and $G^{\prime}$ have different set of vertices, we can define their cut-distance through the cut-distance of their associated graphons; that is,

[TABLE]

If two weighted graphs $G$ and $G^{\prime}$ have the same set of vertices $V(G)$ , then it is clear that we can express their cut-distance as

[TABLE]

Finally, if $\kappa$ is a graphon and $G$ is a weighted graph, then we will define

[TABLE]

Convergence to graphon.

Let $\kappa$ be a graphon with $\|\kappa\|_{1}>0$ . Let $\rho_{n}>0$ satisfy $\rho_{n}\to 0$ and $n\rho_{n}\to\infty$ as $n\to\infty$ . Let the vertex set be given by $[n]=\{1,2,\dots,n\}$ . Let $U_{1},\dots,U_{n}$ be i.i.d. chosen uniformly in $[0,1]$ .

Define $G_{n}\equiv G(n,\kappa,\rho_{n})$ to be the graph defined by connecting $i$ and $j$ with probability $\min\{\rho_{n}\kappa(U_{i},U_{j}),1\}$ . It is clear that $G_{n}$ is a sparse graph sequence and in (Borgs et al., 2014a, Theorem 2.14 and Corollary 2.15) it is shown that, with probability $1$ ,

[TABLE]

as $n\to\infty$ . In this article we generalise the above result when the vertex labels come from a Markov Chain and the sparse graph is constructed after suitable clumping.

3 Model and main results

3.1 Constructing a random graph from RDS

We shall construct a sparse graph on $[n]$ vertices driven by Respondent Driven Sampling (RDS). We will sample $N$ individuals, labelled $X_{1},\dots,X_{N}$ , where $X_{i}\in[0,1]$ . We note that the label space is chosen arbitrarily to be the unit interval only for the sake of mathematical convenience. After sampling, the individuals are clumped into $n$ equally spaced bins, which we represent by the intervals $A_{n,i}=[(i-1)/n,i/n)$ , where $1\leqslant i\leqslant n$ (it is understood that $A_{n,n}$ also includes the right-most point 1). We connect $i$ and $j$ if two successive individuals fall into bin $A_{i}$ followed by bin $A_{j}$ or vice-versa. We chose $N$ in such a way that the graph constructed is sparse and we establish an $L_{1}$ limit for the same. We begin with a precise definition of the sampling scheme via a Markov chain.

Markov Chain representing RDS.

Let $\kappa$ be a graphon. Let $(\Omega,{\cal F},\operatorname{\mathbbm{P}})$ be a probability space, on which we define a Markov chain $X=\{X_{k}\}_{k\geqslant 0}$ with transition probabilities given by

[TABLE]

Since $\kappa$ is symmetric, the Markov chain is time-reversible with stationary distribution

[TABLE]

We shall assume that $X_{0}~{}\stackrel{{\scriptstyle\mathscr{D}}}{{=}}\pi$ , which means the chain is stationary. Then the probability of seeing a transition from $dx$ to $dy$ is given by

[TABLE]

Sparse Random Graph from RDS.

Let $\kappa$ be a graphon. Let $n\geqslant 1$ and $N\equiv N(n)$ . We will now construct a random graph $G(n,N,X,\kappa)$ via the following steps:

•

Let the vertex set be $[n]\vcentcolon=\{1,2,\ldots,n\}$ .

•

Let $X_{1},\dots,X_{N}$ be a realisation of the stationary Markov Chain defined in the previous section up to time $N$ .

•

Equi-partition the unit interval by the intervals $A_{n,1},\dots,A_{n,n}$ . For $1\leqslant i,j\leqslant n$ with $i\neq j$ , define

[TABLE]

•

For $1\leqslant i,j\leqslant n$ with $i\neq j$ , connect $i$ and $j$ if $I_{n}(i,j)=1$ , and leave it unconnected otherwise.

If we choose $N(n)$ appropriately (i.e. $N(n)=o(n^{2})$ ) then the above random graph will be a sparse random graph sequence.

3.2 Main Result

Let $\kappa$ be a given graphon, and consider the sparse graph sequence $G_{n}\equiv G(n,N,X,\kappa)$ defined as in the previous paragraph. We shall make the following assumptions on $\kappa$ and $N$ .

** Assumption (K1).**

There are a constant $\delta>0$ and an integrable function $\varphi:[0,1]\to\mathbbm{R}_{+}$ such that

[TABLE]

** Assumption (N1).**

There are constants $\alpha$ and $\lambda$ , where $0<\alpha\leqslant 1$ and $\lambda>0$ , such that the sequence $N\equiv N(n)$ satisfies

[TABLE]

We are now ready to state the main result.

Theorem 3.1.

Under Assumption (K1) and Assumption (N1), and if $0<\alpha<1$ ,

[TABLE]

almost surely with respect to $\operatorname{\mathbbm{P}}$ .

4 Proof of Theorem 3.1

To prove our result we will need to define two (deterministic and intermediate) weighted graphs. The first graph is an “averaged” version of $G_{n}$ , which we shall denote by $\mathbbm{E}G_{n}$ ; it is the weighted graph on the vertices $[n]$ with edge weights

[TABLE]

Denote by $\frac{n^{2}}{N}\mathbbm{E}G_{n}$ be the weighted graph obtained by scaling the weights of $\mathbbm{E}G_{n}$ by $\frac{n^{2}}{N}$ (as described in Section 2). The second graph, denoted by $H_{n}$ , is the weighted graph on the vertices $[n]$ with edge weights

[TABLE]

For $x,y\in[0,1]$ , let $i_{n}$ and $j_{n}$ be such that $x\in A_{n,i_{n}}$ and $y\in A_{n,j_{n}}$ for all $n\geqslant 1$ . Observe that by the Lebesgue density theorem, $\kappa(x,y)/\|\kappa\|_{1}=\lim_{n\to\infty}\gamma_{n}(i_{n},j_{n})~{}$ almost everywhere on $[0,1]^{2}$ .

Our strategy will be to show that, for large $n$ , $\frac{n^{2}}{N}G_{n}$ is close to $\frac{n^{2}}{N}\mathbbm{E}G_{n}$ , followed by the fact that $\frac{n^{2}}{N}\mathbbm{E}G_{n}$ is close to $H_{n}$ , and finally that $H_{n}$ is close to $\kappa/\|\kappa\|_{1}$ .

We start with the first lemma, which shows that the distance between $\frac{n^{2}}{N}G_{n}$ and $\frac{n^{2}}{N}\mathbbm{E}G_{n}$ goes to [math] almost surely with respect to $\operatorname{\mathbbm{P}}$ . The key ingredient of the proof is a concentration inequality of Paulin (2015).

Lemma 4.1.

We have

[TABLE]

Proof.

Note that

[TABLE]

where $f_{S,T}(X)=f_{S,T}(X_{0},\dots,X_{N})=\frac{1}{2N}\sum_{i:i/n\in S}\sum_{j:j/n\in T}I_{n}(i,j)$ . As $X$ is Harris recurrent by Assumption (K1), we obtain from (Meyn and Tweedie, 2009, Theorem 16.0.2) that the Markov chain has finite mixing time $t_{mix}$ . Let $\varepsilon>0$ be given. Now, changing one point in $f(X)$ will change $f$ by at most 2 edges; that is, $f$ is $1/N$ -Hamming-Lipschitz. Therefore, by (Paulin, 2015, Corollary 2.10),

[TABLE]

where $C$ is a constant that only depends on $t_{mix}$ . Using the union bound,

[TABLE]

By (3.3) and Borel-Cantelli, the claim follows. ∎

Our second lemma shows that the distance between $\frac{n^{2}}{N}\mathbbm{E}G_{n}$ and $H_{n}$ goes to [math]. The key ingredient of the proof is an application of the Stein-Chen method.

Lemma 4.2.

We have

[TABLE]

Proof.

Let

[TABLE]

where $I_{m}=\operatorname{\mathrm{I}}[(X_{m-1},X_{m})\in\{A_{n,ij},A_{n,ji}\}]$ with $A_{n,ij}=A_{n,i}\times A_{n,j}$ , and note that

[TABLE]

Clearly, $I_{n}(i,j)=\operatorname{\mathrm{I}}[E_{n}(i,j)>0]$ . Now,

[TABLE]

where $Z_{n}(i,j)\stackrel{{\scriptstyle\mathscr{D}}}{{=}}\mathop{\mathrm{Poisson}}\bigl{(}\frac{2N}{n^{2}}\mu_{n}(i,j)\bigr{)}$ . Now, let $E_{n}^{s}(i,j)$ be a random variable having the size-bias distribution of $E_{n}(i,j)$ . Then, the Stein-Chen method (see, for example, (Barbour et al., 1992, Theorem 1.B)) yields

[TABLE]

where $\mathop{d_{\mathrm{TV}}}$ denotes the total variation distance. Note that $\mathbbm{E}I_{m}=p(i,j)$ for all $0\leqslant m\leqslant n$ , hence $\mathbbm{E}I_{m}=\mathbbm{E}I_{m^{\prime}}$ for all $1\leqslant m,m^{\prime}\leqslant n$ . Thus, we can use the standard way to construct the size-bias distribution (see for example Goldstein and Rinott (1996)). To this end, let $M$ be a uniformly chosen index from $1$ to $N$ , independent of all else. It is not difficult to show that $\mathscr{L}(E_{n}(i,j)|I_{M}=1)$ is the size-bias distribution of $E_{n}(i,j)$ . We now construct $E_{n}^{s}(i,j)$ on the same probability space as $E_{n}(i,j)$ in the following way. Consider $M$ as given, and consider a process $X^{\prime}=(X^{\prime}_{0},\dots,X^{\prime}_{N})$ with law

[TABLE]

Let $I^{\prime}_{m}=\operatorname{\mathrm{I}}[(X^{\prime}_{m-1},X^{\prime}_{m})\in A_{n,ij}\cup A_{n,ji}]$ , and observe that

[TABLE]

Thus,

[TABLE]

has the size-bias distribution of $E_{n}(i,j)$ . If $I_{M}=1$ , we can couple the two processes $X$ and $X^{\prime}$ perfectly. If $I_{M}=0$ , we couple the two processes as follows. Condition (3.2) implies that $X$ is Harris-recurrent; that is,

[TABLE]

Thus, it is possible to couple $X$ and $X^{\prime}$ such that

[TABLE]

and, similarly,

[TABLE]

We can easily extend the processes $X_{m}$ and $X_{m}^{\prime}$ so that $I_{m}$ and $I^{\prime}_{m}$ are defined for all $m\in\mathbbm{Z}$ . Now, let $G_{1}$ and $G_{2}$ be geometric random variables with success probability $\delta$ dominating the coupling time forward and backward in time from $M$ and $M-1$ respectively. Note that we can construct $G_{1}$ and $G_{2}$ such that $(G_{1},G_{2})\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X$ and $(G_{1},G_{2})\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X^{\prime}$ (note, however that $(G_{1},G_{2})\not\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(X,X^{\prime})$ ). Then,

[TABLE]

Now,

[TABLE]

Applying this bound to (4.5), we have, for each $i$ and $j$ ,

[TABLE]

In conjunction with (LABEL:tsc) and interchanging summation with integration, we arrive at

[TABLE]

Using (3.3), the claim follows. ∎

Our third lemma shows that the distance between $H_{n}$ and $\kappa/\|\kappa\|_{1}$ goes to [math]. The proof is a basic exercise in real analysis.

Lemma 4.3.

We have

[TABLE]

Proof.

To simplify writing, we introduce the notation ${\bar{\kappa}}\vcentcolon=\kappa/\|\kappa\|_{1}$ . Recall that $H_{n}$ is the weighted graph on the vertices $[n]$ with edge weights as in (4.1). Define the graphon ${\mathaccent 28766{\kappa}}_{n}$ by

[TABLE]

Let $g_{n}$ be the graphon associated with the graph $H_{n}$ , which is given by

[TABLE]

Now,

[TABLE]

By (Borgs et al., 2014b, Lemma 5.6),

[TABLE]

Note that, by Taylor’s approximation, $0\leqslant x-(1-e^{-x})\leqslant\min\{x,x^{2}\}$ for $x>0$ . Hence, we have for any $x,y\in\mathbbm{R}$ that

[TABLE]

Let $\tau>0$ (to be chosen later). For any graphon $h$ , let $h\wedge\tau$ be the graphon defined as $(h\wedge\tau)(x,y)\vcentcolon=h(x,y)\wedge\tau$ and let the graphon $(h)_{n}$ be defined analogously to (4.7). Now,

[TABLE]

By the contraction property,

[TABLE]

Let $\varepsilon>0$ . Then there exists $\tau>0$ such that

[TABLE]

For this choice of $\tau$ , as $\min\bigl{\{}1,\frac{2N}{n^{2}}{\mathaccent 28766{\kappa}}_{n}(x,y)\bigr{\}}({\mathaccent 28766{\kappa}}\wedge\tau)_{n}(x,y)$ converges to zero pointwise and is bounded by $\tau$ , we can use dominated convergence to conclude that there exists $n_{0}>0$ such that

[TABLE]

for all $n\geqslant n_{0}$ . Therefore, applying (4.11)–(4.13) to (4.10), we have that

[TABLE]

As $\varepsilon>0$ was arbitrary, we conclude that

[TABLE]

From (4.8), (4.9), and (4.14) the claim now follows. ∎

We are now ready to prove the main result. It follows immediately from the triangle inequality and the above three lemmas.

Proof of Theorem 3.1.

As indicated above using the triangle inequality, we have

[TABLE]

Application of Lemma 4.1, Lemma 4.2 and Lemma 4.3 completes the proof. ∎

5 Discussion

We conclude this note, with some remarks on dense graph sequences and Respondent Driven Sampling.

Dense Graph Sequence.

We have chosen $0<\alpha<1$ so as to ensure that the graph sequence was sparse. If $\alpha=1$ , then we obtain a dense graph sequence. In this case as well, the convergence in the cut-metric would hold but to a “Poissonised” $\kappa$ in the following sense.

Proposition 5.1.

Under Assumption (K1) and Assumption (N1), and if $\alpha=1$ ,

[TABLE]

almost surely with respect to $\operatorname{\mathbbm{P}}$ , where the graphon ${\mathaccent 28766{\kappa}}$ is given by

[TABLE]

Proof.

The proof follows the same way as the proof of Theorem 3.1. So we provide a sketch.

Lemma 4.1 and Lemma 4.2 hold for $\alpha=1$ case as well. Instead of Lemma 4.3, we have to show

[TABLE]

Define the graphon $f_{n}$ as $f_{n}(x,y)=\lambda^{-1}\bigl{(}1-e^{-\lambda{\mathaccent 28766{\kappa}}_{n}(x,y)}\bigr{)}$ . Now,

[TABLE]

Recall that $\lvert e^{-z}-e^{-w}\rvert\leqslant\lvert z-w\rvert$ for all $z,w>0$ . So, for any $x,y\in[0,1]$ ,

[TABLE]

By (Borgs et al., 2014b, Lemma 5.6), $\|{\mathaccent 28766{\kappa}}_{n}-\bar{\kappa}\|_{1}\to 0$ . Hence, using the above this readily implies

[TABLE]

Note that for $b\geqslant 0$ and $x>0$ ,

[TABLE]

So, for any $x,y\in\mathbbm{R}$ , we have

[TABLE]

As $\bigl{\lvert}n^{2}/(2N)-1/\lambda\bigr{\rvert}\to 0$ , dominated convergence implies

[TABLE]

From this the result follows as in the proof of Theorem 3.1 ∎

We note that the Stein-Chen method plays a critical role in proof of Lemma 4.2 when $\alpha=1$ , as $\frac{N}{n^{2}}\to\lambda>0$ ; that is, the mean of the Poisson random variable does not converge to [math], so that moment bounds would not suffice to prove Lemma 4.2.

Respondent Driven Sampling (RDS).

One common approach in RDS to correct for bias towards high degrees, is to ask participants of the study to estimate their own degree and then weigh the participants by the inverse of their reported degree. This procedure is known as multiplicity sampling, and was first used in the context of RDS by Rothbart et al. (1982). What Theorem 3.1 implies in essence is that one could also clump participants together according to general characteristics (such as age, gender, etc.). If the degree of the participants is captured by these characteristics, the bias towards participants with high degrees would disappear.

It was argued by Heckathorn (2007) that multiplicity sampling cannot in general correct for the bias towards nodes with high degree due to possible differential recruitment, which means that some groups of participants are systematically able to recruit more people than others. Other methods of estimations, including the original estimators of Heckathorn (1997) as well as the clumping procedure proposed in this article, are equally susceptible to differential recruitment bias.

The mathematical reason behind this bias is that the stationary distribution of a one-referral Markov process on a set of types, which is the commonly used mathematical tool to derive RDS estimators, can be different from the stationary distribution of a multi-type branching process with the same transition probabilities if the average number of offspring depends on the types. This was described precisely by Athreya and Röllin (2016), where the two models, a one-referral Markov chain and Poisson-offspring branching process, show substantially different over-sampling of high-degree vertices in the network. In the one-referral Markov chain case, the over-sampling is exactly proportional to the degree, but in the case of a Poisson number of referrals, it is proportional to a quantity that is harder to calculate (the eigenfunction of the mean replacement measure of the branching process). In practice, differential recruitment bias is typically reduced by limiting the number of referrals, traditionally to no more than three.

Heckathorn (2007) also proposes a method, called estimation through dual-components, which is supposed to take differential recruitment into account. This is the default method used in the widely-used statistical software RDSAT (see Volz et al. (2012)). The basic idea is to estimate the transition probabilities governing the referrals, calculate the proportion of different types one would expect to see under absence of both bias due to different degrees and bias due to differential recruitment, compare with the actual observed proportions, and then to work backwards to find the true proportions in the population. However, the theoretical justifications in Heckathorn (2007) for the details of the procedure are somewhat opaque.

Open Problems.

We conclude the article with a couple of questions that can be explored.

(1)

In Athreya and Röllin (2016) a rigourous framework was set up to handle convergence in dense graph limits. For dense graphs, the theory of graphons (whose range is $[0,1]$ ) was used to establish the convergence. Graphons in dense graph setting characterise the limit via convergence of subgraph counts. This aspect applies under several equivalent metrics. One should be able to establish the RDS models used in Athreya and Röllin (2016) to prove convergence in the $L_{1}$ metric as in this article. The approach could be one as laid out in proof of (Borgs et al., 2014a, Theorem 2.14). 2. (2)

As already mentioned before, in practice, an RDS sample comes typically in the form of a tree, rather than a single chain, and hence, a multi-type branching process, where the types could represent characteristics such as gender, age etc., would constitute a more realistic mathematical model. The stationary distribution of such a branching process is difficult to solve analytically in general, but under additional assumptions, such as considering only finitely many types, a numerical approach would definitely be feasible. In this light, it seems that a statistical theory based on branching process theory, rather than Markov chain theory, could put the framework of dual-components from Heckathorn (2007) onto solid ground or even improve on it.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Athreya and Röllin (2016) S. Athreya and A. Röllin (2016). Dense graph limits under respondent-driven sampling. Ann. App. Probab. 26 , 2193–2210.
2Barbour et al. (1992) A. D. Barbour, L. Holst and S. Janson (1992). Poisson approximation . Oxford University Press, New York.
3Bollobás and Riordan (2009) B. Bollobás and O. Riordan (2009). Metrics for sparse graphs. In Surveys in combinatorics 2009 , volume 365 of London Math. Soc. Lecture Note Ser. , pages 211–287. Cambridge University Press, Cambridge.
4Borgs et al. (2014 a) C. Borgs, J. T. Chayes, H. Cohn and Y. Zhao (2014 a). An L p superscript 𝐿 𝑝 L^{p} theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions. ar Xiv preprint ar Xiv:1401.2906 .
5Borgs et al. (2014 b) C. Borgs, J. T. Chayes, H. Cohn and Y. Zhao (2014 b). An L p superscript 𝐿 𝑝 L^{p} theory of sparse graph convergence II: LD convergence, quotients, and right convergence. ar Xiv preprint ar Xiv:1408.0744 .
6Goldstein and Rinott (1996) L. Goldstein and Y. Rinott (1996). Multivariate normal approximations by Stein’s method and size bias couplings. J. Appl. Probab. 33 , 1–17.
7Heckathorn (1997) D. D. Heckathorn (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Probl. 44 , pp. 174–199.
8Heckathorn (2007) D. D. Heckathorn (2007). Extensions of respondent-driven sampling: analyzing continuous variables and controlling for differential recruitment. Sociol. Methodol. 37 , 151–207.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

RESPONDENT DRIVEN SAMPLING AND SPARSE GRAPH CONVERGENCE

Abstract

1 Introduction

Acknowledgements:

2 Sparse graph convergence

Weighted graphs.

Graphons.

Convergence to graphon.

3 Model and main results

3.1 Constructing a random graph from RDS

Markov Chain representing RDS.

Sparse Random Graph from RDS.

3.2 Main Result

Theorem 3.1**.**

4 Proof of Theorem 3.1

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Proof of Theorem 3.1.

5 Discussion

Dense Graph Sequence.

Proposition 5.1**.**

Proof.

Respondent Driven Sampling (RDS).

Open Problems.

Theorem 3.1.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Proposition 5.1.