TL;DR
This paper presents a static analysis method that uses symbolic execution and hidden Markov models to deobfuscate Windows API calls in malware, achieving high prediction accuracy without executing the code.
Contribution
It introduces a novel static analysis technique combining symbolic execution and HMMs for generic deobfuscation of Windows API calls in malware.
Findings
87.60% API name prediction accuracy
Effective static alternative to dynamic analysis
Handles obfuscated API call patterns
Abstract
A common way to get insight into a malicious program's functionality is to look at which API functions it calls. To complicate the reverse engineering of their programs, malware authors deploy API obfuscation techniques, hiding them from analysts' eyes and anti-malware scanners. This problem can be partially addressed by using dynamic analysis; that is, by executing a malware sample in a controlled environment and logging the API calls. However, malware that is aware of virtual machines and sandboxes might terminate without showing any signs of malicious behavior. In this paper, we introduce a static analysis technique allowing generic deobfuscation of Windows API calls. The technique utilizes symbolic execution and hidden Markov models to predict API names from the arguments passed to the API functions. Our best prediction model can correctly identify API names with 87.60% accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
