Cream Skimming the Underground: Identifying Relevant Information Points   from Online Forums

Felipe Moreno-Vera; Mateus Nogueira; Cain\~a Figueiredo; Daniel Sadoc; Menasch\'e; Miguel Bicudo; Ashton Woiwood; Enrico Lovat; Anton Kocheturov,; Leandro Pfleger de Aguiar

arXiv:2308.02581·cs.CR·August 8, 2023

Cream Skimming the Underground: Identifying Relevant Information Points from Online Forums

Felipe Moreno-Vera, Mateus Nogueira, Cain\~a Figueiredo, Daniel Sadoc, Menasch\'e, Miguel Bicudo, Ashton Woiwood, Enrico Lovat, Anton Kocheturov,, Leandro Pfleger de Aguiar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a machine learning system that automatically detects and classifies underground forum posts about vulnerabilities, achieving high accuracy and providing insights into hacking community behaviors.

Contribution

It presents a supervised random forest model for classifying forum posts related to CVEs, with high accuracy, and offers interpretability and community analysis insights.

Findings

01

Achieved over 0.99 accuracy, precision, and recall in classification.

02

Differentiated between weaponization and exploitation in forum discussions.

03

Provided insights into hacking community profits and behaviors.

Abstract

This paper proposes a machine learning-based approach for detecting the exploitation of vulnerabilities in the wild by monitoring underground hacking forums. The increasing volume of posts discussing exploitation in the wild calls for an automatic approach to process threads and posts that will eventually trigger alarms depending on their content. To illustrate the proposed system, we use the CrimeBB dataset, which contains data scraped from multiple underground forums, and develop a supervised machine learning model that can filter threads citing CVEs and label them as Proof-of-Concept, Weaponization, or Exploitation. Leveraging random forests, we indicate that accuracy, precision and recall above 0.99 are attainable for the classification task. Additionally, we provide insights into the difference in nature between weaponization and exploitation, e.g., interpreting the output of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fmorenovr/nlptoolkit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCybercrime and Law Enforcement Studies · Spam and Phishing Detection · Advanced Malware Detection Techniques