Would a File by Any Other Name Seem as Malicious?
Andre T. Nguyen, Edward Raff, Aaron Sant-Miller

TL;DR
This paper explores whether file names alone can predict malware presence, demonstrating that a character-level CNN can effectively classify files as malicious or benign using only filename data.
Contribution
It introduces a novel approach of using character-level CNNs on filenames to predict malware, providing an alternative to content-based analysis.
Findings
File names contain predictive information about malware presence.
Character-level CNNs achieve notable accuracy in malware classification from filenames.
Filename-based prediction can aid prioritization when file content access is limited.
Abstract
Successful malware attacks on information technology systems can cause millions of dollars in damage, the exposure of sensitive and private information, and the irreversible destruction of data. Anti-virus systems that analyze a file's contents use a combination of static and dynamic analysis to detect and remove/remediate such malware. However, examining a file's entire contents is not always possible in practice, as the volume and velocity of incoming data may be too high, or access to the underlying file contents may be restricted or unavailable. If it were possible to obtain estimates of a file's relative likelihood of being malicious without looking at the file contents, we could better prioritize file processing order and aid analysts in situations where a file is unavailable. In this work, we demonstrate that file names can contain information predictive of the presence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
