# Reducing Seed Bias in Respondent-Driven Sampling by Estimating Block   Transition Probabilities

**Authors:** Yilin Zhang, Karl Rohe, Sebastien Roch

arXiv: 1812.01188 · 2018-12-21

## TL;DR

This paper introduces a method to reduce seed bias in respondent-driven sampling by estimating block transition probabilities and using them in a post-stratified estimator, improving accuracy in population proportion estimates.

## Contribution

It presents a novel approach to estimate block transition probabilities and applies them to create a seed-bias-reducing estimator with proven consistency and improved performance.

## Key findings

- Estimated block transition probabilities are highly accurate.
- The proposed post-stratified estimator reduces seed bias effectively.
- Simulation results show lower RMSE compared to existing methods.

## Abstract

Respondent-driven sampling (RDS) is a popular approach to study marginalized or hard-to-reach populations. It collects samples from a networked population by incentivizing participants to refer their friends into the study. One major challenge in analyzing RDS samples is seed bias. Seed bias refers to the fact that when the social network is divided into multiple communities (or blocks), the RDS sample might not provide a balanced representation of the different communities in the population, and such unbalance is correlated with the initial participant (or the seed). In this case, the distributions of estimators are typically non-trivial mixtures, which are determined (1) by the seed and (2) by how the referrals transition from one block to another. This paper shows that (1) block-transition probabilities are easy to estimate with high accuracy, and (2) we can use these estimated block-transition probabilities to estimate the stationary distribution over blocks and thus, an estimate of the block proportions. This stationary distribution on blocks has previously been used in the RDS literature to evaluate whether the sampling process has appeared to `mix'. We use these estimated block proportions in a simple post-stratified (PS) estimator that greatly diminishes seed bias. By aggregating over the blocks/strata in this way, we prove that the PS estimator is $\sqrt{n}$-consistent under a Markov model, even when other estimators are not. Simulations show that the PS estimator has smaller Root Mean Square Error (RMSE) compared to the state-of-the-art estimators.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.01188/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1812.01188/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1812.01188/full.md

---
Source: https://tomesphere.com/paper/1812.01188