Using Malware Detection Techniques for HPC Application Classification
Thomas Jakobsche, Florina M. Ciorba

TL;DR
This paper introduces a novel application classification method for HPC systems using fuzzy hash similarity and machine learning, achieving high accuracy in identifying applications and improving security and resource management.
Contribution
It presents a new fuzzy hash-based classification approach combined with Random Forests for accurate HPC application identification, including unknown samples.
Findings
Achieved 90% macro F1-score on a diverse dataset.
Effectively classifies applications despite input variations and noise.
Enhances security and resource management in HPC environments.
Abstract
HPC systems face security and compliance challenges, particularly in preventing waste and misuse of computational resources by unauthorized or malicious software that deviates from allocation purpose. Existing methods to classify applications based on job names or resource usage are often unreliable or fail to capture applications that have different behavior due to different inputs or system noise. This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables. By comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples. We evaluate the Fuzzy Hash Classifier on a dataset of 92 application classes and 5333 distinct application samples. The proposed method achieved a macro f1-score of 90% (micro f1-score: 89%, weighted f1-score:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Information and Cyber Security
