A Novel Active Learning Approach to Label One Million Unknown Malware Variants

Ahmed Bensaoud; Jugal Kalita

arXiv:2507.02959·cs.CR·July 8, 2025

A Novel Active Learning Approach to Label One Million Unknown Malware Variants

Ahmed Bensaoud, Jugal Kalita

PDF

TL;DR

This paper introduces two innovative active learning methods, including a Vision Transformer-based Bayesian Neural Network, to efficiently label one million unknown malware variants, demonstrating improved stability and robustness in uncertainty estimation.

Contribution

It presents a novel active learning framework using ViT-BNN for malware classification, advancing uncertainty handling and applicability to large-scale unknown malware datasets.

Findings

01

ViT-BNN outperforms traditional models in uncertainty estimation.

02

The proposed methods effectively label large-scale malware datasets.

03

ViT-BNN demonstrates superior stability and robustness.

Abstract

Active learning for classification seeks to reduce the cost of labeling samples by finding unlabeled examples about which the current model is least certain and sending them to an annotator/expert to label. Bayesian theory can provide a probabilistic view of deep neural network models by asserting a prior distribution over model parameters and estimating the uncertainties by posterior distribution over these parameters. This paper proposes two novel active learning approaches to label one million malware examples belonging to different unknown modern malware families. The first model is Inception-V4+PCA combined with several support vector machine (SVM) algorithms (UTSVM, PSVM, SVM-GSU, TBSVM). The second model is Vision Transformer based Bayesian Neural Networks ViT-BNN. Our proposed ViT-BNN is a state-of-the-art active learning approach that differs from current methods and can apply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.