How to Kill Symbolic Deobfuscation for Free; or Unleashing the Potential of Path-Oriented Protections
Mathilde Ollivier, S\'ebastien Bardin, Richard Bonichon, Jean-Yves, Marion

TL;DR
This paper introduces a new class of lightweight, efficient, and analytically proven path-oriented obfuscation techniques that effectively counter Dynamic Symbolic Execution (DSE) attacks with minimal performance impact.
Contribution
The paper proposes a novel class of path-oriented protections specifically designed to hinder DSE-based deobfuscation, addressing limitations of existing defenses.
Findings
Effectively counters DSE-based deobfuscation attacks
Maintains low runtime overhead
Provides analytical proof of resistance
Abstract
Code obfuscation is a major tool for protecting software intellectual property from attacks such as reverse engineering or code tampering. Yet, recently proposed (automated) attacks based on Dynamic Symbolic Execution (DSE) shows very promising results, hence threatening software integrity. Current defenses are not fully satisfactory, being either not efficient against symbolic reasoning, or affecting runtime performance too much, or being too easy to spot. We present and study a new class of anti-DSE protections coined as path-oriented protections targeting the weakest spot of DSE, namely path exploration. We propose a lightweight, efficient, resistant and analytically proved class of obfuscation algorithms designed to hinder DSE-based attacks. Extensive evaluation demonstrates that these approaches critically counter symbolic deobfuscation while yielding only a very slight overhead.
| Obfuscation type | Slowdown | ||||
|---|---|---|---|---|---|
| Symbolic Execution | Overhead | ||||
| Coverage | Secret | runtime | |||
| Standard | Virt | ||||
| Virt | |||||
| Virt | ✓ | ||||
| Path-Oriented | |||||
| SPLIT | ✓ | ||||
| [5] | ✓ | ||||
| ✓ | |||||
| FOR | ✓ | ✓ | |||
| (new) | ✓ | ✓ | |||
| Transformation | Dataset #1 | Dataset #2 | |||
|---|---|---|---|---|---|
| (#TO/#Samples) | Goal 1 | ||||
| 3h TO | Goal 2 | ||||
| 1h TO | Goal 1 | ||||
| 24h TO | Goal 2 | ||||
| 3h TO | Goal 2 | ||||
| 8h TO | |||||
| Virt | |||||
| Virt | |||||
| Virt | |||||
| SPLIT () | |||||
| SPLIT () | |||||
| SPLIT () | |||||
| FOR () | |||||
| FOR () | |||||
| FOR () | |||||
| Transformation | Dataset #1 | Dataset #2 | ||
|---|---|---|---|---|
| RO | CI | RO | CI | |
| Virt | ||||
| Virt | ||||
| Virt | ||||
| SPLIT () | ||||
| SPLIT () | ||||
| SPLIT () | ||||
| FOR () | ||||
| FOR () | ||||
| FOR () | ||||
| FOR () | ||||
| FOR () P2 | ||||
| FOR () REC | ||||
| FOR () WORD | ||||
| Tool | Robust ? | ||
| P1 | P2 | P3 | |
| (basic) | (obfuscated) | (weak) | |
| GCC -Ofast | ✓ | ✓ | |
| clang -Ofast | ✓ | ||
| Frama-C Slice | ✓ | ✓ | |
| Frama-C Taint | ✓ | ✓ | ✓ |
| Triton (taint) | ✓ | ✓ | ✓ |
| KLEE | ✓ | ✓ | ✓ |
| Unprotected | Virt | Hardened – For (k=5) | |
|---|---|---|---|
| (TO = 10 sec) | (TO = 5 min) | (TO = 24h) | |
| KLEE | 10/10 | 10/10 | 0/10 ✓ |
| Binsec | 10/10 | 10/10 | 0/10 ✓ |
| Triton | 10/10 | 10/10 | 0/10 ✓ |
| Entry size | #LOC | KLEE exec. (s) | ||
|---|---|---|---|---|
| average | StdDev. | average | StdDev. | |
| 16 bytes | s | |||
| 1 byte (*) | s | |||
| Program | locs | KLEE exec. (s) |
|---|---|---|
| City hash | ||
| Fast hash | ||
| Spooky hash | ||
| MD5 hash | ||
| AES | ||
| DES | ||
| GRUB |
| Timeouts | ||||
| NURS | BFS | DFS | allpath | |
| Virt | ||||
| Virt | ||||
| Virt | ||||
| Flat-Virt | ||||
| Flat-MBA | ||||
| Split () | ||||
| Split () | ||||
| For () | ||||
| For () | ||||
| For () | ||||
| For () | ||||
| Obfuscation type | Slowdown | |||||
|---|---|---|---|---|---|---|
| Symbolic Execution | Overhead Runtime | Overhead Code Size | ||||
| Coverage | Secret | |||||
| Tigress | Virt | |||||
| Virt | TO | |||||
| Virt | TO | |||||
| Virt | TO | TO | ||||
| Virt-Flat | ||||||
| Flat | ||||||
| Flat-EncA | ||||||
| Our approach | TO | |||||
| For | TO | TO | ||||
| TO | TO | |||||
| Split | TO | |||||
| TO | ||||||
| DSE Slowdown | Runtime overhead | Code Size increase | #TO | |||||||
| Transformation | min | max | avg | min | max | avg | min | max | avg | |
| Virt | ||||||||||
| Virt | ||||||||||
| Virt | ||||||||||
| Flattening | ||||||||||
| EncodeArithmetic | ||||||||||
| Split () | ||||||||||
| Split () | ||||||||||
| For () | ||||||||||
| For () | ||||||||||
| For () | ||||||||||
| For () | ||||||||||
| Virt For () | ||||||||||
| DSE slowdown | Runtime overhead | #TO | |||||
| Transformation | min | max | avg | min | max | avg | |
| Virt | |||||||
| Virt | |||||||
| Virt | |||||||
| Flat-Virt | |||||||
| Flat-MBA | |||||||
| Split () | |||||||
| Split () | |||||||
| For () | |||||||
| For () | |||||||
| For () | |||||||
| For () | |||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
22affiliationtext: LORIA - Nancy, France11affiliationtext: CEA LIST - Paris-Saclay, France
How to Kill Symbolic Deobfuscation for Free
(or: Unleashing the Potential of Path-Oriented Protections)
Mathilde Ollivier
Sebastien Bardin
Richard Bonichon
Jean-Yves Marion
Abstract
Code obfuscation is a major tool for protecting software intellectual property from attacks such as reverse engineering or code tampering. Yet, recently proposed (automated) attacks based on Dynamic Symbolic Execution (DSE) shows very promising results, hence threatening software integrity. Current defenses are not fully satisfactory, being either not efficient against symbolic reasoning, or affecting runtime performance too much, or being too easy to spot. We present and study a new class of anti-DSE protections coined as path-oriented protections targeting the weakest spot of DSE, namely path exploration. We propose a lightweight, efficient, resistant and analytically proved class of obfuscation algorithms designed to hinder DSE-based attacks. Extensive evaluation demonstrates that these approaches critically counter symbolic deobfuscation while yielding only a very slight overhead.
1 Introduction
Context
Reverse engineering and code tampering are widely used to extract proprietary assets (e.g., algorithms or cryptographic keys) or bypass security checks from software. Code protection techniques precisely seek to prevent, or at least make difficult, such man-at-the-end attacks, where the attacker has total control of the environment running the software under attack. Obfuscation **[20, 19]** aims at hiding a program’s behavior by transforming its executable code in such a way that the behavior is conserved but the program becomes much harder to understand.
Even though obfuscation techniques are quite resilient against basic automatic reverse engineering (including static attacks, e.g. disassembly, and dynamic attacks, e.g. monitoring), code analysis improves quickly **[37]**. Recent attacks based on Dynamic Symbolic Execution (DSE, a.k.a. concolic execution) **[17, 28, 38]** use logical formulas to represent input constraints along an execution path, and then automatically solve these constraints to discover new execution paths. DSE appears to be very efficient against existing obfuscations **[5, 8, 35, 49, 22]**, combining the best of dynamic and semantic analysis.
Problem
The current state of symbolic deobfuscation is actually pretty unclear. Dedicated protections have been proposed, mainly based on hard-to-solve predicates, like Mixed Boolean Arithmetic formulas (MBA) **[50]** or cryptographic hash functions **[40]**. Yet the effect of complexified constraints on automatic solvers is hard to predict **[6]**, while cryptographic hash functions are easy to spot, may induce significant overhead and are amenable to key extraction attacks (possibly by DSE).
On the other hand, DSE has been fruitfully applied on malware and legit codes protected by state-of-the-art tools and methods, including virtualization, self-modification, hashing or MBA **[49, 35, 8]**. The recent systematic experimental evaluation of symbolic deobfuscation by Banescu et al. **[5]** shows that most standard obfuscation techniques do not seriously impact DSE. Only nested virtualization seems to provide a good protection, assuming the defender is ready to pay a high cost in terms of runtime and code size **[35]**.
Goals and Challenges
We want to propose a new class of dedicated anti-DSE obfuscation techniques to render automated attacks based on symbolic execution inefficient. These techniques should be strong – making DSE intractable in practice, and lightweight – with very low overhead in both code size and runtime performance. While most anti-DSE defenses try to break the symbolic reasoning part of DSE (constraint solver), we instead target its real weak spot, namely path exploration. Banescu et al. **[5]** present one such specific obfuscation scheme but with a large space overhead and no experimental evaluation. We aim at proposing a general framework to understand such obfuscations and to define new schemes both strong and lightweight.
Contribution
We study path-oriented protections, a class of protections seeking to hinder DSE by substantially increasing the number of feasible paths within a program.
- •
We detail a formal framework describing path-oriented protections (Sec. 4). We characterize their desirable properties — namely tractability, strength, and the key criterion of single value path (SVP). The framework is predictive, in the sense that our classification is confirmed by experimental evaluation (Sec. 8), allowing both to shed new light on the few existing path-oriented protections and to provide guidelines to design new ones. In particular, no existing protection **[5]** achieves both tractability and optimal strength (SVP). As a remedy, we propose the first two obfuscation schemes achieving both tractability and optimal strength (Sec. 5).
- •
We highlight the importance of the anchorage policy, i.e. the way to choose where to insert protection in the code, in terms of protection efficiency and robustness. Especially, we identify a way to achieve optimal composition of path-oriented protections (Sec. 6.1), and to completely prevent taint-based and slic-based attacks (two powerful code-level attacks against obfuscation), coined as resistance by design (Sec. 6.2).
- •
We conduct extensive experiments (Sec. 8.3) with two different attack scenarios — exhaustive path coverage (Sec. 8.3) and secret finding. Results confirm that path-oriented protections are much stronger against DSE attacks than standard protections (including nested virtualization) for only a slight overhead. Moreover, while existing techniques **[5]** can still be weak in some scenarios (e.g., secret finding), our new optimal schemes cripple symbolic deobfuscation at essentially no cost in any setting. Finally, experiments against slice, pattern-matching and taint attacks confirm the quality of our robust-by-design mechanism.
As a practical outcome, we propose a new hardened deobfuscation benchmark (Sec. 9), currently out-of-reach of symbolic engines, in order to extend existing obfuscation benchmarks **[1, 5, 35]**.
Discussion
We study a powerful class of protections against symbolic deobfuscation, based on a careful analysis of DSE – we target its weakest point (path exploration) when other dedicated methods usually aim at its strongest point (constraint solving and ever-evolving SMT solvers). We propose a predictive framework allowing to understand these protections, as well as several concrete protections impacting DSE more than three levels of virtualization at essentially no cost. We expect them to be also efficient against other semantic attacks **[10, 29]** (cf. Sec. 10). From a methodological point of view, this work extends recent attempts at rigorous evaluation of obfuscation methods. We provide both an analytical evaluation, as Bruni et al. **[14]** for anti-abstract model checking, and a refinement of the experimental setup initiated by Banescu et al. **[5]**.
2 Motivation
2.1 Attacker model
Goal
We consider man-at-the-end scenarios where the attacker has full access to a potentially protected code under attack. The attacker has just the binary code and no access to the source code. The attacker model and the methodology follows closely the survey by Schrittwieser et al. **[37*]**. In order to be more concrete, we will focus on the following (intermediate) goals: *
- *Exhaustive Path Exploration. Covering every feasible path of the binary allows the attacker to retrieve a consistent Control Flow Graph and understand what the original program performs.
- *Secret Finding. Focusing on a specific part of the code (e.g. license checks) and try to understand or retrieve a secret (e.g. a key).
**
Capacity
we assume an all-powerful symbolic adversary, that is, this adversary can run a correct and complete Dynamic Symbolic Execution (DSE). In practice, symbolic engines are correct — every path discovered is actually feasible — but incomplete — they can be tricked into missing feasible paths **[48]** or the underlying solver may timeout. This adversary can also perform additional code-analysis based attack steps, such as slicing- **[42]**, pattern-matching or tainting-based **[38]** code simplifications.
Caveat
In the remains, we always consider this attacker model. There is one caveat that it is worth mentioning now: part of our experimental evaluations are done from source codes, i.e. in our experiments the attacker has sometimes the source code. The reason is that state-of-the-art source-level DSE tools are much more efficient than binary-level ones, and that there is no good state-of-the-art tools to perform slice or taint attacks on binary codes. Our experimental conditions are much more in favor of the attacker, and as a result they show that our approach is all the more effective.
2.2 Motivating example
Let us illustrate anti-symbolic path-oriented protections on a toy crackme program111https://github.com/trailofbits/manticore. Fig. 1 displays a skeleton of its source code. main calls check to verify each character of the 11 bytes input. It then outputs ”win” for a correct guess, ”lose” otherwise. Each subfunction check_char_i hides a secret character value behind bitwise transformations, like xor or shift. Such a challenge can be easily solved, completely automatically, by symbolic execution tools. KLEE **[16]** needs 0.03s (on C code) and Binsec **[24]** 0.3s (on binary code) to both find a winning input and explore all paths.
Standard protections
Let us now protect the program with standard obfuscations to measure their impact on symbolic deobfuscation. We will rely on Tigress **[21]**, a widely used tool for systematic evaluation of deobfuscation methods **[8, 35, 5]**, to apply (nested) virtualization, a most effective obfuscation **[5]**. Yet, Sec. 2.2 clearly shows that virtualization does not prevent KLEE from finding the winning output, though it can thwart path exploration – but with a high runtime overhead (40).
The case for (new) path-oriented protections
To defend against symbolic attackers, we thus need better anti-DSE obfuscation: path-oriented protections. Such protections aim at exponentially increasing the number of paths that a DSE-based deobfuscation tool, like KLEE, must explore. Two such protections are Split and For, illustrated in Fig. 2 on function check_char_0 of the example.
For the sake of simplicity, the protections are implemented in a naive form, sensitive to slicing or compiler optimizations. Robustness is discussed afterwards. In a nutshell, Split— an instance of Range Divider **[5*]** — adds a number of conditional statements depending on new fresh inputs, increasing the number of paths to explore by a factor of . Also, in this implementation we use a junk variable garb and two additional inputs ch1 and ch2 unrelated to the original code. The novel obfuscation For (Sec. 5) adds loops whose upper bound depends on distinct input bytes and which recompute a value that will be used later, expanding the number of paths to explore by a factor of – assuming a 8-bit char type. This implementation does not introduce any junk variable nor additional input. In both cases, the obfuscated code relies on the input, forcing DSE to explore a priori all paths. Sec. 2.2 summarizes the performance of Split and For. *
• Both Split and For do not induce any overhead, * • Splitis highly efficient (timeout) against coverage but not against secret finding, while For is highly efficient for both. * • For*() performs already better than Split () and further experiments (Sec. 8) shows For to be a much more effective path protection than Split. * **
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Tigress challenge. http://tigress.cs.arizona.edu/challenges.html.
- 2[2] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen, W. Grieskamp, M. Harman, M. J. Harrold, and P. Mc Minn. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software , 2013.
- 3[3] Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley. Enhancing symbolic execution with veritesting. Commun. ACM , 59(6), 2016.
- 4[4] Gogul Balakrishnan and Thomas W. Reps. WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst. , 32, 2010.
- 5[5] Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. Code obfuscation against symbolic execution attacks. In Annual Conference on Computer Security Applications, ACSAC 2016 , 2016.
- 6[6] Sebastian Banescu, Christian S. Collberg, and Alexander Pretschner. Predicting the resilience of obfuscated code against symbolic execution attacks via machine learning. In USENIX Security Symposium , 2017.
- 7[7] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. In Advances in Cryptology - CRYPTO , 2001.
- 8[8] Sébastien Bardin, Robin David, and Jean-Yves Marion. Backward-bounded DSE: targeting infeasibility questions on obfuscated codes. In 2017 IEEE Symposium on Security and Privacy, SP , 2017.
