Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning
Sebasti\~ao Alves de Jesus Filho, Gustavo Di Giovanni Bernardo, Paulo Henrique Ribeiro Gabriel, Bruno Bogaz Zarpel\~ao, Rodrigo Sanches Miani

TL;DR
This paper presents a novel approach using supervised machine learning to identify malicious posts on Dark Web forums in Brazilian Portuguese, including dataset creation, labeling, and evaluation of models, with high accuracy results.
Contribution
It introduces three original datasets, a multi-stage labeling process, and a comprehensive evaluation of text representations and classifiers for Dark Web content in Portuguese.
Findings
LightGBM with TF-IDF achieved high accuracy in detection
Topic modeling validated the model's robustness
First study focusing on Brazilian Portuguese Dark Web posts
Abstract
Given the constant growth and increasing sophistication of cyberattacks, cybersecurity can no longer rely solely on traditional defense techniques and tools. Proactive detection of cyber threats has become essential to help security teams identify potential risks and implement effective mitigation measures. Cyber Threat Intelligence (CTI) plays a key role by providing security analysts with evidence-based knowledge about cyber threats. CTI information can be extracted using various techniques and data sources; however, machine learning has proven promising. As for data sources, social networks and online discussion forums are commonly explored. In this study, we apply text mining techniques and machine learning to data collected from Dark Web forums in Brazilian Portuguese to identify malicious posts. Our contributions include the creation of three original datasets, a novel multi-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCybercrime and Law Enforcement Studies · Spam and Phishing Detection · Intelligence, Security, War Strategy
