On the bias of BFS

Maciej Kurant; Athina Markopoulou; Patrick Thiran

arXiv:1004.1729·cs.DM·February 23, 2011

On the bias of BFS

Maciej Kurant, Athina Markopoulou, Patrick Thiran

PDF

Open Access

TL;DR

This paper characterizes the degree bias of BFS in large graphs, compares it with other traversal methods, and offers correction techniques, with practical validation on Facebook social network data.

Contribution

It provides a quantitative analysis of BFS bias, shows that various traversal methods share similar biases, and proposes correction methods validated on real social network data.

Findings

01

BFS sampling is biased toward high degree nodes.

02

All common traversal techniques exhibit similar bias in random graphs.

03

Bias is amplified in graphs with strong positive assortativity.

Abstract

Breadth First Search (BFS) and other graph traversal techniques are widely used for measuring large unknown graphs, such as online social networks. It has been empirically observed that an incomplete BFS is biased toward high degree nodes. In contrast to more studied sampling techniques, such as random walks, the precise bias of BFS has not been characterized to date. In this paper, we quantify the degree bias of BFS sampling. In particular, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction of covered nodes, in a random graph $R G (p_{k})$ with a given degree distribution $p_{k}$ . Furthermore, we also show that, for $R G (p_{k})$ , all commonly used graph traversal techniques (BFS, DFS, Forest Fire, and Snowball Sampling) lead to the same bias, and we show how to correct for this bias. To give a broader perspective, we compare this class of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Advanced Multi-Objective Optimization Algorithms · Approximation Theory and Sequence Spaces