Studying Ranking-Incentivized Web Dynamics
Ziv Vasilisky, Moshe Tennenholtz, Oren Kurland

TL;DR
This paper introduces a new dataset derived from the Internet Archive's WayBack Machine to empirically analyze how authors manipulate web content to influence rankings, confirming behaviors observed in controlled experiments.
Contribution
It provides the first publicly available dataset for empirical study of ranking-incentivized web dynamics, enabling validation of theoretical models with real-world data.
Findings
Authors tend to mimic highly ranked documents' content.
Content mimicry can lead to improved document rankings.
Empirical results align with previous controlled experiment findings.
Abstract
The ranking incentives of many authors of Web pages play an important role in the Web dynamics. That is, authors who opt to have their pages highly ranked for queries of interest, often respond to rankings for these queries by manipulating their pages; the goal is to improve the pages' future rankings. Various theoretical aspects of this dynamics have recently been studied using game theory. However, empirical analysis of the dynamics is highly constrained due to lack of publicly available datasets.We present an initial such dataset that is based on TREC's ClueWeb09 dataset. Specifically, we used the WayBack Machine of the Internet Archive to build a document collection that contains past snapshots of ClueWeb documents which are highly ranked by some initial search performed for ClueWeb queries. Temporal analysis of document changes in this dataset reveals that findings recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
