MASCOT: Analyzing Malware Evolution Through A Well-Curated Source Code Dataset
Bojing Li, Duo Zhong, Dharani Nadendla, Gabriel Terceros, Prajna Bhandar, Raguvir S, Charles Nicholas

TL;DR
This paper presents MASCOT, a comprehensive malware source code dataset and a multi-view genealogy analysis method to understand malware evolution, revealing increasing complexity, standardization, and lineage expansion driven by code reuse.
Contribution
It introduces a curated malware source code dataset and a novel multi-view genealogy analysis approach to study malware evolution and connections.
Findings
Malware exhibits increasing complexity and standardization over time.
Code reuse significantly influences malware lineage expansion.
Despite quality issues, malware development follows mainstream software engineering trends.
Abstract
In recent years, the explosion of malware and extensive code reuse have formed complex evolutionary connections among malware specimens. The rapid pace of development makes it challenging for existing studies to characterize recent evolutionary trends. In addition, intuitive tools to untangle these intricate connections between malware specimens or categories are urgently needed. This paper introduces a manually-reviewed malware source code dataset containing 6032 specimens. Building on and extending current research from a software engineering perspective, we systematically evaluate the scale, development costs, code quality, as well as security and dependencies of modern malware. We further introduce a multi-view genealogy analysis to clarify malware connections: at an overall view, this analysis quantifies the strength and direction of connections among specimens and categories; at a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
