Investigating and Comparing Discussion Topics in Multilingual Underground Forums
Mariella Mischinger, Vahid Ghafouri, Sergio Pastrana, Guillermo Suarez-Tangil

TL;DR
This paper presents an unsupervised method to analyze and compare discussion topics in multilingual underground forums, revealing language-based sub-communities and understanding criminal knowledge sharing.
Contribution
It introduces a novel unsupervised approach to cluster semantically related themes in multilingual forums, addressing language barriers and dark jargon in criminal communities.
Findings
Identified language-specific sub-communities within a criminal forum.
Uncovered dark jargon and its semantic meanings.
Demonstrated the method's potential for understanding criminal discussions.
Abstract
Underground forums play a crucial role in the criminal ecosystem, facilitating the exchange of knowledge and the trade of illegal tools and services. By analyzing the skills, motivations, focus, and operations of cyber-criminals active in these forums, cybersecurity professionals and law enforcement can better understand their tactics, assess the risks they pose to society, and develop more effective countermeasures. A significant challenge in analyzing these forums arises from language barriers, either because they blend different languages or because they use community-specific slang. In this paper, we address this challenge through the use of a combination of unsupervised methods that group together semantically related conversational themes (i.e., topics) into clusters. We apply our methodology to analyze a prolific, invite-only, Russian-English criminal forum that has been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCybercrime and Law Enforcement Studies · Spam and Phishing Detection · Digital Communication and Language
