SIREN: Software Identification and Recognition in HPC Systems
Thomas Jakobsche, Fredrik Roberts\'en, Jessica R. Jones, Utz-Uwe Haus, Florina M. Ciorba

TL;DR
SIREN is a framework that enhances software identification in HPC systems by using fuzzy hashing and process metadata, improving observability, security, and system optimization.
Contribution
SIREN introduces a novel process-level data collection framework utilizing fuzzy hashing for reliable software recognition in HPC environments.
Findings
SIREN successfully identifies repeated executions of known applications.
It provides insights into software usage patterns.
It enables similarity-based identification of unknown applications.
Abstract
HPC systems use monitoring and operational data analytics to ensure efficiency, performance, and orderly operations. Application-specific insights are crucial for analyzing the increasing complexity and diversity of HPC workloads, particularly through the identification of unknown software and recognition of repeated executions, which facilitate system optimization and security improvements. However, traditional identification methods using job or file names are unreliable for arbitrary user-provided names (a.out). Fuzzy hashing of executables detects similarities despite changes in executable version or compilation approach while preserving privacy and file integrity, overcoming these limitations. We introduce SIREN, a process-level data collection framework for software identification and recognition. SIREN improves observability in HPC by enabling analysis of process metadata,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
