Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning

Sebasti\~ao Alves de Jesus Filho; Gustavo Di Giovanni Bernardo; Paulo Henrique Ribeiro Gabriel; Bruno Bogaz Zarpel\~ao; Rodrigo Sanches Miani

arXiv:2511.23183·cs.CR·December 1, 2025

Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning

Sebasti\~ao Alves de Jesus Filho, Gustavo Di Giovanni Bernardo, Paulo Henrique Ribeiro Gabriel, Bruno Bogaz Zarpel\~ao, Rodrigo Sanches Miani

PDF

Open Access

TL;DR

This paper presents a novel approach using supervised machine learning to identify malicious posts on Dark Web forums in Brazilian Portuguese, including dataset creation, labeling, and evaluation of models, with high accuracy results.

Contribution

It introduces three original datasets, a multi-stage labeling process, and a comprehensive evaluation of text representations and classifiers for Dark Web content in Portuguese.

Findings

01

LightGBM with TF-IDF achieved high accuracy in detection

02

Topic modeling validated the model's robustness

03

First study focusing on Brazilian Portuguese Dark Web posts

Abstract

Given the constant growth and increasing sophistication of cyberattacks, cybersecurity can no longer rely solely on traditional defense techniques and tools. Proactive detection of cyber threats has become essential to help security teams identify potential risks and implement effective mitigation measures. Cyber Threat Intelligence (CTI) plays a key role by providing security analysts with evidence-based knowledge about cyber threats. CTI information can be extracted using various techniques and data sources; however, machine learning has proven promising. As for data sources, social networks and online discussion forums are commonly explored. In this study, we apply text mining techniques and machine learning to data collected from Dark Web forums in Brazilian Portuguese to identify malicious posts. Our contributions include the creation of three original datasets, a novel multi-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCybercrime and Law Enforcement Studies · Spam and Phishing Detection · Intelligence, Security, War Strategy