BuyTheBy: A dataset of 18,710 text-based paper mill advertisements with 51,812 timestamped prices
Reese AK Richardson, Spencer S Hong, Anna Abalkina

TL;DR
BuyTheBy is a comprehensive dataset of 18,710 text-based paper mill ads with timestamped prices, enabling market analysis of academic fraud services across multiple countries.
Contribution
This paper introduces BuyTheBy, a novel large-scale dataset of market advertisements for academic fraud services, facilitating quantitative market studies.
Findings
Dataset includes 18,710 ads with 51,812 timestamped prices.
Contains data from seven different countries and multiple product categories.
Demonstrates potential for analyzing market dynamics of academic fraud services.
Abstract
The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. We perform elementary analysis of this dataset to demonstrate its utility for quantitative understanding of markets for academic fraud services and suggest future use cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
