AVClass2: Massive Malware Tag Extraction from AV Labels
Silvia Sebasti\'an, Juan Caballero

TL;DR
AVClass2 is an advanced tool for automatically extracting, organizing, and updating malware tags from AV labels, enabling efficient categorization and search across massive malware datasets.
Contribution
It introduces AVClass2, a novel malware tagging system that builds an open, adaptable taxonomy from AV labels, surpassing prior tools by capturing diverse malware information.
Findings
Successfully processed 42 million samples.
Enabled advanced malware search capabilities.
Maintained an up-to-date malware knowledge base.
Abstract
Tags can be used by malware repositories and analysis services to enable searches for samples of interest across different dimensions. Automatically extracting tags from AV labels is an efficient approach to categorize and index massive amounts of samples. Recent tools like AVClass and Euphony have demonstrated that, despite their noisy nature, it is possible to extract family names from AV labels. However, beyond the family name, AV labels contain much valuable information such as malware classes, file properties, and behaviors. This work presents AVClass2, an automatic malware tagging tool that given the AV labels for a potentially massive number of samples, extracts clean tags that categorize the samples. AVClass2 uses, and helps building, an open taxonomy that organizes concepts in AV labels, but is not constrained to a predefined set of tags. To keep itself updated as AV vendors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Network Security and Intrusion Detection
