MagnetDB: A Longitudinal Torrent Discovery Dataset with IMDb-Matched Movies and TV Shows
Scott Seidenberger, Noah Pursell, Anindya Maiti

TL;DR
MagnetDB is a comprehensive, longitudinal dataset of BitTorrent torrents from 2018 to 2024, including metadata and IMDb-matched movies and TV shows, enabling detailed research on piracy trends and distribution dynamics.
Contribution
This paper introduces MagnetDB, the largest longitudinal torrent dataset with IMDb annotations, facilitating new empirical research on digital piracy and content distribution patterns.
Findings
Over 28.6 million torrents collected
Metadata for more than 950 million files included
Enables analysis of piracy evolution and distribution trends
Abstract
BitTorrent remains a prominent channel for illicit distribution of copyrighted material, yet the supply side of such content remains understudied. We introduce MagnetDB, a longitudinal dataset of torrents discovered through the BitTorrent DHT between 2018 and 2024, containing more than 28.6 million torrents and metadata of more than 950 million files. While our primary focus is on enabling research based on the supply of pirated movies and TV shows, the dataset also encompasses other legitimate and illegitimate torrents. By applying IMDb-matching and annotation to movie and TV show torrents, MagnetDB facilitates detailed analyses of pirated content evolution in the BitTorrent network. Researchers can leverage MagnetDB to examine distribution trends, subcultural practices, and the gift economy within piracy ecosystems. Through its scale and temporal scope, MagnetDB presents a unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Explainable Artificial Intelligence (XAI) · Data Analysis with R
