Mining actionable information from security forums: the case of malicious IP addresses
Joobin Gharibshah, Tai Ching Li, Andre Castro, Konstantinos, Pelechrinis, Evangelos E. Papalexakis, Michalis Faloutsos

TL;DR
This paper presents a language-independent method to automatically identify malicious IP addresses from unstructured hacker forum posts by combining behavioral user features with textual analysis, achieving high accuracy and uncovering more threats than existing blacklists.
Contribution
The authors introduce a novel matrix decomposition approach that extracts behavioral features from forum users, enabling language-independent detection of malicious IPs without relying on advanced NLP techniques.
Findings
Over 88% precision in identifying malicious IPs across three forums
Detected up to three times more malicious IPs than VirusTotal blacklist
Collected approximately 600,000 posts from multiple forums for analysis
Abstract
The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of identifying malicious IP addresses, among the IP addresses which are reported in the forums. We develop a method to automate the identification of malicious IP addresses with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. A key design feature of our technique is that it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
