TL;DR
AutoYara automates the generation of Yara rules using biclustering and large n-grams, significantly reducing analyst workload and matching or surpassing human performance in malware detection accuracy.
Contribution
The paper introduces AutoYara, a novel biclustering-based method for automatic Yara rule generation that is fast, resource-efficient, and improves detection effectiveness.
Findings
AutoYara reduces analyst rule creation time by 44-86%.
AutoYara maintains low false-positive rates while achieving high true-positive rates.
AutoYara can outperform human analysts in certain detection scenarios.
Abstract
Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams () combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
