# Characterizing Activity on the Deep and Dark Web

**Authors:** Nazgol Tavabi, Nathan Bartley, Andr\'es Abeliuk, Sandeep Soni, Emilio, Ferrara, Kristina Lerman

arXiv: 1903.00156 · 2019-03-04

## TL;DR

This paper analyzes a large dataset of deep and dark web forum messages to uncover discussion topics, their evolution, and hidden similarities, aiding in understanding illicit activities and detecting anomalies.

## Contribution

It introduces a novel approach combining LDA and non-parametric HMM to model topic dynamics and identify similarities and anomalies across dark web forums.

## Key findings

- Identified key discussion topics in dark web forums.
- Modeled topic evolution over time using HMM.
- Detected anomalous events and forum similarities.

## Abstract

The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been systematically investigated. In this paper, we study a large corpus of messages posted to 80 d2web forums over a period of more than a year. We identify topics of discussion using LDA and use a non-parametric HMM to model the evolution of topics across forums. Then, we examine the dynamic patterns of discussion and identify forums with similar patterns. We show that our approach surfaces hidden similarities across different forums and can help identify anomalous events in this rich, heterogeneous data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.00156/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1903.00156/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1903.00156/full.md

---
Source: https://tomesphere.com/paper/1903.00156