A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists
Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten, Zimmermann, Stephen D. Strowes, Narseo Vallina-Rodriguez

TL;DR
This paper critically examines the creation, stability, and biases of internet top lists like Alexa, revealing their limitations and impact on research results, and offers guidelines for their cautious use.
Contribution
It provides a comprehensive analysis of top list structures, stability, and biases, and evaluates their effects on research outcomes, which was previously underexplored.
Findings
Top lists often overestimate results by an order of magnitude.
Some lists exhibit high day-to-day fluctuation.
Using top lists can significantly bias research conclusions.
Abstract
A broad range of research areas including Internet measurement, privacy, and network security rely on lists of target domains to be analysed; researchers make use of target lists for reasons of necessity or efficiency. The popular Alexa list of one million domains is a widely used example. Despite their prevalence in research papers, the soundness of top lists has seldom been questioned by the community: little is known about the lists' creation, representativity, potential biases, stability, or overlap between lists. In this study we survey the extent, nature, and evolution of top lists used by research communities. We assess the structure and stability of these lists, and show that rank manipulation is possible for some lists. We also reproduce the results of several scientific studies to assess the impact of using a top list at all, which list specifically, and the date of list…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
