AVMiner: Expansible and Semantic-Preserving Anti-Virus Labels Mining Method
Ligeng Chen, Zhongling He, Hao Wu, Yuhang Gong, Bing Mao

TL;DR
AVMiner is an expandable, semantic-preserving system that automatically extracts and ranks vital malware-related tokens from AV labels using NLP and clustering, enhancing malware diagnosis without expert knowledge.
Contribution
It introduces AVMiner, a novel system that automatically mines and ranks important tokens from AV labels, capable of self-updating and not relying on expert knowledge.
Findings
Outperforms previous methods on large datasets
Successfully extracts vital malware tokens
Self-updates with new samples
Abstract
With the increase in the variety and quantity of malware, there is an urgent need to speed up the diagnosis and the analysis of malware. Extracting the malware family-related tokens from AV (Anti-Virus) labels, provided by online anti-virus engines, paves the way for pre-diagnosing the malware. Automatically extract the vital information from AV labels will greatly enhance the detection ability of security enterprises and equip the research ability of security analysts. Recent works like AVCLASS and AVCLASS2 try to extract the attributes of malware from AV labels and establish the taxonomy based on expert knowledge. However, due to the uncertain trend of complicated malicious behaviors, the system needs the following abilities to face the challenge: preserving vital semantics, being expansible, and free from expert knowledge. In this work, we present AVMiner, an expansible malware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
