Living off the Analyst: Harvesting Features from Yara Rules for Malware   Detection

Siddhant Gupta; Fred Lu; Andrew Barlow; Edward Raff; Francis Ferraro,; Cynthia Matuszek; Charles Nicholas; and James Holt

arXiv:2411.18516·cs.CR·November 28, 2024

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro,, Cynthia Matuszek, Charles Nicholas, and James Holt

PDF

Open Access

TL;DR

This paper explores re-purposing existing YARA rules by extracting sub-signatures to enhance malware detection, demonstrating improved accuracy on the EMBER dataset and revealing diverse feature behaviors.

Contribution

It introduces a novel method of extracting sub-signatures from YARA rules to create features that improve malware detection capabilities.

Findings

01

Extracted sub-signatures improve detection accuracy.

02

Features exhibit power-law distribution with specific and generic behaviors.

03

Sub-signatures include dual-purpose and broadly generic indicators.

Abstract

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection

MethodsSparse Evolutionary Training