Malware distributions and graph structure of the Web
Sanja \v{S}\'cepanovi\'c, Igor Mishkovski, Jukka Ruohonen, Frederick, Ayala-G\'omez, Tuomas Aura, Sami Hyrynsalmi

TL;DR
This study analyzes the Web's graph structure to distinguish between clean and malicious websites, revealing differences in local and global properties and assessing their predictive power for malware detection.
Contribution
It is the first large-scale analysis comparing the structural properties of malicious and clean Web pages, bridging Web science and cybersecurity.
Findings
Different distributions explain local and network properties.
Structural differences help classify malware websites.
Results can improve malware detection algorithms.
Abstract
Knowledge about the graph structure of the Web is important for understanding this complex socio-technical system and for devising proper policies supporting its future development. Knowledge about the differences between clean and malicious parts of the Web is important for understanding potential treats to its users and for devising protection mechanisms. In this study, we conduct data science methods on a large crawl of surface and deep Web pages with the aim to increase such knowledge. To accomplish this, we answer the following questions. Which theoretical distributions explain important local characteristics and network properties of websites? How are these characteristics and properties different between clean and malicious (malware-affected) websites? What is the prediction power of local characteristics and network properties to classify malware websites? To the best of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Network Security and Intrusion Detection
