Automatic Yara Rule Generation Using Biclustering

Edward Raff; Richard Zak; Gary Lopez Munoz; William Fleming; Hyrum S.; Anderson; Bobby Filar; Charles Nicholas; James Holt

arXiv:2009.03779·cs.CR·September 9, 2020

Automatic Yara Rule Generation Using Biclustering

Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S., Anderson, Bobby Filar, Charles Nicholas, James Holt

PDF

1 Repo

TL;DR

AutoYara automates the generation of Yara rules using biclustering and large n-grams, significantly reducing analyst workload and matching or surpassing human performance in malware detection accuracy.

Contribution

The paper introduces AutoYara, a novel biclustering-based method for automatic Yara rule generation that is fast, resource-efficient, and improves detection effectiveness.

Findings

01

AutoYara reduces analyst rule creation time by 44-86%.

02

AutoYara maintains low false-positive rates while achieving high true-positive rates.

03

AutoYara can outperform human analysts in certain detection scenarios.

Abstract

Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams ( $n \geq 8$ ) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NeuromorphicComputationResearchProgram/AutoYara
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.