ClarAVy: A Tool for Scalable and Accurate Malware Family Labeling

Robert J. Joyce; Derek Everett; Maya Fuchs; Edward Raff; James Holt

arXiv:2502.02759·cs.CR·February 6, 2025

ClarAVy: A Tool for Scalable and Accurate Malware Family Labeling

Robert J. Joyce, Derek Everett, Maya Fuchs, Edward Raff, James Holt

PDF

1 Repo

TL;DR

ClarAVy is a scalable, accurate malware family labeling tool that leverages a Bayesian approach to improve over existing methods, effectively handling large datasets and reducing errors in family attribution.

Contribution

The paper introduces ClarAVy, a novel malware labeling tool that addresses key shortcomings of existing methods through a Bayesian aggregation strategy, enhancing accuracy and scalability.

Findings

01

Achieves 8-12% higher accuracy than prior tools

02

Successfully labels approximately 40 million malicious files

03

Effectively resolves family aliasing and detection parsing issues

Abstract

Determining the family to which a malicious file belongs is an essential component of cyberattack investigation, attribution, and remediation. Performing this task manually is time consuming and requires expert knowledge. Automated tools using that label malware using antivirus detections lack accuracy and/or scalability, making them insufficient for real-world applications. Three pervasive shortcomings in these tools are responsible: (1) incorrect parsing of antivirus detections, (2) errors during family alias resolution, and (3) an inappropriate antivirus aggregation strategy. To address each of these, we created our own malware family labeling tool called ClarAVy. ClarAVy utilizes a Variational Bayesian approach to aggregate detections from a collection of antivirus products into accurate family labels. Our tool scales to enormous malware datasets, and we evaluated it by labeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FutureComputing4AI/ClarAVy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.