Mining actionable information from security forums: the case of   malicious IP addresses

Joobin Gharibshah; Tai Ching Li; Andre Castro; Konstantinos; Pelechrinis; Evangelos E. Papalexakis; Michalis Faloutsos

arXiv:1804.04800·cs.SI·April 16, 2018

Mining actionable information from security forums: the case of malicious IP addresses

Joobin Gharibshah, Tai Ching Li, Andre Castro, Konstantinos, Pelechrinis, Evangelos E. Papalexakis, Michalis Faloutsos

PDF

TL;DR

This paper presents a language-independent method to automatically identify malicious IP addresses from unstructured hacker forum posts by combining behavioral user features with textual analysis, achieving high accuracy and uncovering more threats than existing blacklists.

Contribution

The authors introduce a novel matrix decomposition approach that extracts behavioral features from forum users, enabling language-independent detection of malicious IPs without relying on advanced NLP techniques.

Findings

01

Over 88% precision in identifying malicious IPs across three forums

02

Detected up to three times more malicious IPs than VirusTotal blacklist

03

Collected approximately 600,000 posts from multiple forums for analysis

Abstract

The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of identifying malicious IP addresses, among the IP addresses which are reported in the forums. We develop a method to automate the identification of malicious IP addresses with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. A key design feature of our technique is that it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.