gMBA: Expression Semantic Guided Mixed Boolean-Arithmetic Deobfuscation Using Transformer Architectures
Youjeong Noh, Joon-Young Paik, Jingun Kwon, Eun-Sun Cho

TL;DR
This paper introduces gMBA, a Transformer-based framework that uses semantic representations via truth tables to improve the deobfuscation of complex Mixed Boolean-Arithmetic expressions, aiding in malware analysis.
Contribution
It presents a novel semantic-guided deobfuscation framework using truth tables and Transformer architectures, enhancing the recovery of original code from obfuscated expressions.
Findings
Semantic guidance improves deobfuscation accuracy
Truth tables effectively represent expression semantics
Transformer models outperform traditional methods
Abstract
Mixed Boolean-Arithmetic (MBA) obfuscation protects intellectual property by converting programs into forms that are more complex to analyze. However, MBA has been increasingly exploited by malware developers to evade detection and cause significant real-world problems. Traditional MBA deobfuscation methods often consider these expressions as part of a black box and overlook their internal semantic information. To bridge this gap, we propose a truth table, which is an automatically constructed semantic representation of an expression's behavior that does not rely on external resources. The truth table is a mathematical form that represents the output of expression for all possible combinations of input. We also propose a general and extensible guided MBA deobfuscation framework (gMBA) that modifies a Transformer-based neural encoder-decoder Seq2Seq architecture to incorporate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques · Adversarial Robustness in Machine Learning
