Malware Lineage in the Wild
Irfan Ul Haq, Sergio Chica, Juan Caballero, Somesh Jha

TL;DR
This paper introduces a novel malware lineage method that accurately identifies malware versions in the wild, even when samples are packed and polymorphic, enabling better understanding of malware evolution.
Contribution
It presents the first technique to identify malware versions and shared functions directly from in-the-wild samples, overcoming packing and polymorphism challenges.
Findings
Achieved 26x reduction from samples to versions on average.
Successfully applied to 10 malware families.
Evaluated on 13 open-source programs with high accuracy.
Abstract
Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of prior malware lineage approaches is to consider every input sample a separate malware version. This is problematic since a majority of malware are packed and the packing process produces many polymorphic variants (i.e., executables with different file hash) of the same malware version. Thus, many samples correspond to the same malware version and it is challenging to identify distinct malware versions from polymorphic variants. This problem does not manifest in prior malware lineage approaches because they work on synthetic malware, malware that are not packed, or packed malware for which unpackers are available. In this work, we propose a novel malware lineage approach that works on malware samples collected in the wild. Given a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
