# The Language of Legal and Illegal Activity on the Darknet

**Authors:** Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri, Abend

arXiv: 1905.05543 · 2019-06-05

## TL;DR

This study investigates the linguistic features of legal and illegal drug-related texts on the Darknet, comparing them to clear net counterparts, revealing distinct linguistic patterns that could aid automated monitoring of Darknet activities.

## Contribution

It provides an in-depth analysis of Darknet texts' linguistic characteristics and evaluates how well existing NLP tools perform on this domain, filling a significant research gap.

## Key findings

- Legal and illegal drug texts differ linguistically in POS tag distribution.
- Named entity coverage varies between Darknet and clear net texts.
- Darknet texts exhibit unique linguistic features that can inform detection methods.

## Abstract

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drug-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.05543/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.05543/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1905.05543/full.md

---
Source: https://tomesphere.com/paper/1905.05543