Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection
Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro,, Cynthia Matuszek, Charles Nicholas, and James Holt

TL;DR
This paper explores re-purposing existing YARA rules by extracting sub-signatures to enhance malware detection, demonstrating improved accuracy on the EMBER dataset and revealing diverse feature behaviors.
Contribution
It introduces a novel method of extracting sub-signatures from YARA rules to create features that improve malware detection capabilities.
Findings
Extracted sub-signatures improve detection accuracy.
Features exhibit power-law distribution with specific and generic behaviors.
Sub-signatures include dual-purpose and broadly generic indicators.
Abstract
A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsSparse Evolutionary Training
