Mining the YARA Ecosystem: From Ad-Hoc Sharing to Data-Driven Threat Intelligence
Dectot--Le Monnier de Gouville Esteban, Mohammad Hamdaqa, Moataz Chouchen

TL;DR
This study analyzes the open-source YARA rule ecosystem, revealing its centralized, outdated, and noisy nature, and advocates for a shift towards data-driven, curated threat intelligence to improve malware detection effectiveness.
Contribution
It provides the first large-scale empirical analysis of the YARA ecosystem, highlighting structural, quality, and operational issues, and offers a dataset and pipeline for future improvements.
Findings
Highly centralized ecosystem with 80% of rules from 10 authors
Repositories show median inactivity of 782 days and a 4.2-year technical lag
Operational effectiveness is hampered by noise, low recall, and bias towards legacy threats
Abstract
YARA has established itself as the de facto standard for "Detection as Code," enabling analysts and DevSecOps practitioners to define signatures for malware identification across the software supply chain. Despite its pervasive use, the open-source YARA ecosystem remains characterized by ad-hoc sharing and opaque quality. Practitioners currently rely on public repositories without empirical evidence regarding the ecosystem's structural characteristics, maintenance and diffusion dynamics, or operational reliability. We conducted a large-scale mixed-method study of 8.4 million rules mined from 1,853 GitHub repositories. Our pipeline integrates repository mining to map supply chain dynamics, static analysis to assess syntactic quality, and dynamic benchmarking against 4,026 malware and 2,000 goodware samples to measure operational effectiveness. We reveal a highly centralized structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
