Quantifying Paedophile Activity in a Large P2P System
Matthieu Latapy, Cl\'emence Magnien, Rapha\"el Fournier

TL;DR
This study analyzes a large P2P system to estimate the prevalence of paedophile activity, developing detection tools and providing the most accurate statistics to date on query and user involvement.
Contribution
It introduces a novel detection method with validated false positive and negative rates, providing the first precise estimates of paedophile activity in a major P2P network.
Findings
Approximately 0.25% of queries are paedophile
Over 0.2% of users enter paedophile queries
The methods offer the most reliable estimates to date
Abstract
Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
