Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi, Raffaela Groner, Irdin Pekaric

TL;DR
This paper presents a longitudinal analysis of dark web cybercrime forums over six years, revealing persistent core topics and gradual thematic evolution using a novel topic-modeling framework.
Contribution
It introduces a new longitudinal topic-modeling framework combining embeddings, clustering, and temporal analysis to study dark web content evolution.
Findings
75% of discussion volume is in core topics
Median topic lifespan is 75 months
Short-lived themes account for 3% of activity
Abstract
The dark web hosts a dynamic ecosystem of cybercrime forums and marketplaces that adapt to law enforcement pressure, technological change, and economic incentives. Prior research has extracted cyber threat intelligence from these platforms using static snapshots, with limited attention to how discussions evolve over time. In this study, we conduct a longitudinal analysis of 25,065 websites in the dark web using 11,403,638 HTML snapshots (approximately 1245.38 GB) collected over six years. We develop a longitudinal topic-modeling framework combining domain-specific embeddings, density-based clustering and temporal aggregation to measure topic prevalence and lifecycle at the website level. Our analysis identifies 55 thematic clusters. We find that approximately 75% of total discussion volume is concentrated in a small set of persistent core topics, while short-lived themes account for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
