How to Kill Symbolic Deobfuscation for Free; or Unleashing the Potential   of Path-Oriented Protections

Mathilde Ollivier; S\'ebastien Bardin; Richard Bonichon; Jean-Yves; Marion

arXiv:1908.01549·cs.CR·August 8, 2019

How to Kill Symbolic Deobfuscation for Free; or Unleashing the Potential of Path-Oriented Protections

Mathilde Ollivier, S\'ebastien Bardin, Richard Bonichon, Jean-Yves, Marion

PDF

TL;DR

This paper introduces a new class of lightweight, efficient, and analytically proven path-oriented obfuscation techniques that effectively counter Dynamic Symbolic Execution (DSE) attacks with minimal performance impact.

Contribution

The paper proposes a novel class of path-oriented protections specifically designed to hinder DSE-based deobfuscation, addressing limitations of existing defenses.

Findings

01

Effectively counters DSE-based deobfuscation attacks

02

Maintains low runtime overhead

03

Provides analytical proof of resistance

Abstract

Code obfuscation is a major tool for protecting software intellectual property from attacks such as reverse engineering or code tampering. Yet, recently proposed (automated) attacks based on Dynamic Symbolic Execution (DSE) shows very promising results, hence threatening software integrity. Current defenses are not fully satisfactory, being either not efficient against symbolic reasoning, or affecting runtime performance too much, or being too easy to spot. We present and study a new class of anti-DSE protections coined as path-oriented protections targeting the weakest spot of DSE, namely path exploration. We propose a lightweight, efficient, resistant and analytically proved class of obfuscation algorithms designed to hinder DSE-based attacks. Extensive evaluation demonstrates that these approaches critically counter symbolic deobfuscation while yielding only a very slight overhead.

Tables12

Table 1. Table 1 : DSE Attack on the Crackme Example (KLEE)

	Obfuscation type		Slowdown
Symbolic Execution	Overhead
			Coverage	Secret	runtime
Standard	Virt		$\times, \times$	$\times, \times$	$\times 1.1$
Standard	Virt $\times 2$		$\times$	$\times, \times$	$\times 1.3$
	Virt $\times 3$		✓	$\times$	$\times 40$
Path-Oriented		$k = 11$	$\times, \times$	$\times, \times$	$\times 1.0$
	SPLIT	$k = 15$	✓	$\times, \times$	$\times 1.0$
	[5]	$k = 19$	✓	$\times, \times$	$\times 1.0$
		$k = 1$	✓	$\times$	$\times 1.0$
	FOR	$k = 2$	✓	✓	$\times 1.0$
	(new)	$k = 3$	✓	✓	$\times 1.0$

Table 2. Table 2 : Classification of obfuscation schemes

		Tractable		SVP	Stealth
		Time	Space
Range Divider [5]	switch	✓	$\times$	✓	$\times$
Split [5]	if	✓	✓	$\times$	✓
For	word	$\times$	✓	✓	✓
For	byte	✓	✓	✓	✓
Write		✓	✓	✓	$\sim$

Table 3. Table 3 : Impact of obfuscations on DSE

Transformation	Dataset #1		Dataset #2
Transformation
(#TO/#Samples)	Goal 1
3h TO	Goal 2
1h TO	Goal 1
24h TO	Goal 2
3h TO	Goal 2
8h TO
Virt	$0 / 46$	$0 / 15$	$0 / 7$	$0 / 7$	$0 / 7$
Virt $\times 2$	$1 / 46$	$0 / 15$	$0 / 7$	$0 / 7$	$0 / 7$
Virt $\times 3$	$5 / 46$	$2 / 15$	$1 / 7$	$0 / 7$	$0 / 7$
SPLIT ( $k = 10$ )	$1 / 46$	$0 / 15$	$0 / 7$	$0 / 7$	$0 / 7$
SPLIT ( $k = 13$ )	$4 / 46$	$0 / 15$	$1 / 7$	$1 / 7$	$0 / 7$
SPLIT ( $k = 17$ )	$18 / 46$	$2 / 15$	$3 / 7$	$2 / 7$	$1 / 7$
FOR ( $k = 1$ )	$2 / 46$	$0 / 15$	$0 / 7$	$0 / 7$	$0 / 7$
FOR ( $k = 3$ )	$30 / 46$	$8 / 15$	$3 / 7$	$2 / 7$	$1 / 7$
FOR ( $k = 5$ )	$46 / 46$	$15 / 15$	$7 / 7$	$7 / 7$	$7 / 7$

Table 4. Table 4 : Impact of obfuscations on runtime performance

Transformation	Dataset #1		Dataset #2
Transformation
	RO	CI	RO	CI

Virt	$\times 1.5$	$\times 1.5$	$\times 1.7$	$\times 1.4$
Virt $\times 2$	$\times 15$	$\times 2.5$	$\times 5.1$	$\times 2.1$
Virt $\times 3$	$\times 1.6 \cdot 10^{3}$	$\times 4$	$\times 362$	$\times 3.0$

SPLIT ( $k = 10$ )	$\times 1.2$	$\times 1.0$	$\times 1.0$	$\times 1.0$
SPLIT ( $k = 13$ )	$\times 1.2$	$\times 1.0$	$\times 1.0$	$\times 1.0$
SPLIT ( $k = 50$ )	$\times 1.5$	$\times 1.5$	$\times 1.1$	$\times 1.0$
FOR ( $k = 1$ )	$\times 1.0$	$\times 1.0$	$\times 1.0$	$\times 1.0$
FOR ( $k = 3$ )	$\times 1.1$	$\times 1.0$	$\times 1.0$	$\times 1.0$
FOR ( $k = 5$ )	$\times 1.3$	$\times 1.0$	$\times 1.1$	$\times 1.0$
FOR ( $k = 50$ )	$\times 1.5$	$\times 1.5$	$\times 1.2$	$\times 1.1$

FOR ( $k = 5$ ) P2	$\times 1.3$	$\times 1.0$	$\times 1.1$	$\times 1.0$
FOR ( $k = 5$ ) REC	$\times 3.0$	$\times 1.0$	$\times 2.7$	$\times 1.0$

FOR ( $k = 1$ ) WORD	$\times 2.6 \cdot 10^{3}$	$\times 1.0$	$\times 2.1 \cdot 10^{3}$	$\times 1.0$

Table 5. Table 5 : Robustness of path-oriented protections

Tool	Robust ?
Tool
	P1	P2	P3
	(basic)	(obfuscated)	(weak)

GCC -Ofast	✓	✓	$\times$
clang -Ofast	$\times$	✓	$\times$
Frama-C Slice	✓	✓	$\times$
Frama-C Taint	✓	✓	✓
Triton (taint)	✓	✓	✓
KLEE	✓	✓	✓

Table 6. Table 6 : Results on 10 hardened examples (secret finding)

	Unprotected	Virt $\times 1$	Hardened – For (k=5)
	(TO = 10 sec)	(TO = 5 min)	(TO = 24h)
KLEE	10/10	10/10	0/10 ✓
Binsec	10/10	10/10	0/10 ✓
Triton	10/10	10/10	0/10 ✓

Table 7. Table 7 : Statistics on Dataset #1 (46 programs)

Entry size	#LOC		KLEE exec. (s)
Entry size
	average	StdDev.	average	StdDev.
16 bytes	$21$	$1.9$	$2.6$ s	$6.2$
1 byte (*)	$17$	$2.2$	$1.8$ s	$6.2$

Table 8. Table 8 : Statistics on Dataset #2 (7 programs)

Program	locs	KLEE exec. (s)
City hash	$547$	$7.41$
Fast hash	$934$	$7.74$
Spooky hash	$625$	$7.12$
MD5 hash	$157$	$33.31$
AES	$571$	$1.42$
DES	$424$	$0.15$
GRUB	$101$	$0.06$

Table 9. Table 9 : Impact of search heuristics – Dataset #1 – secret finding – 1h TO

	Timeouts
	NURS	BFS	DFS	allpath
Virt	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
Virt $\times 2$	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
Virt $\times 3$	$1 / 15$	$1 / 15$	$1 / 15$	$2 / 15$
Flat-Virt	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
Flat-MBA	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
Split ( $\times 10$ )	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
Split ( $\times 13$ )	$1 / 15$	$1 / 15$	$0 / 15$	$1 / 15$
For ( $k = 1$ )	$0 / 15$	$0 / 15$	$0 / 15$	$0 / 15$
For ( $k = 2$ )	$1 / 15$	$1 / 15$	$1 / 15$	$4 / 15$
For ( $k = 3$ )	$10 / 15$	$8 / 15$	$8 / 15$	$13 / 15$
For ( $k = 4$ )	$𝟏𝟓 / 𝟏𝟓$	$𝟏𝟓 / 𝟏𝟓$	$𝟏𝟓 / 𝟏𝟓$	$𝟏𝟓 / 𝟏𝟓$

Table 10. Table 10 : Benchmarking obfuscations on the crackme example – tool KLEE– 1h30 timeout.

	Obfuscation type		Slowdown
Symbolic Execution	Overhead Runtime	Overhead Code Size
	Overhead Runtime	Overhead Code Size	Coverage	Secret
Tigress	Virt $\times 2$		$\times 5.4 \cdot 10^{3}$	$\times 11$	$\times 1.3$	$\times 1.2$
	Virt $\times 3$		TO	$\times 1.1 \cdot 10^{3}$	$\times 41$	$\times 3.0$
	Virt $\times 4$		TO	$\times 96 \cdot 10^{3}$	$\times 4.5 \cdot 10^{3}$	$\times 4.0$
	Virt $\times 5$		TO	TO	$\times 449 \cdot 10^{3}$	$\times 5.2$
	Virt-Flat		$\times 1.0$	$\times 1.8$	$\times 1.1$	$\times 1.5$
	Flat $\times 2$		$\times 276$	$\times 1.8$	$\times 1.1$	$\times 1.3$
	Flat-EncA		$\times 83$	$\times 1.2$	$\times 1.0$	$\times 1.0$

Our approach		$k = 1$	TO	$\times 3.3 \cdot 10^{3}$	$\times 1.0$	$\times 1.0$
	For	$k = 2$	TO	TO	$\times 1.0$	$\times 1.0$
		$k = 3$	TO	TO	$\times 1.0$	$\times 1.0$
		$k = 11$	$\times 3.4 \cdot 10^{3}$	$\times 2.8$	$\times 1.0$	$\times 1.0$
	Split	$k = 15$	TO	$\times 3.9$	$\times 1.0$	$\times 1.0$
		$k = 19$	TO	$\times 5.1$	$\times 1.0$	$\times 1.0$

Table 11. Table 11 : Obfuscations on Dataset #1 — allpath coverage — 3h timeout

	DSE Slowdown			Runtime overhead			Code Size increase			#TO
Transformation	min	max	avg	min	max	avg	min	max	avg	#TO
Virt	$\times 1.0$	$\times 17$	$\times 2.8$	$\times 1.2$	$\times 5.6$	$\times 1.5$	$\times 1.5$	$\times 1.5$	$\times 1.5$	$0 / 46$
Virt $\times 2$	$\times 1.0$	$\times 402$	$\times 47$	$\times 1.3$	$\times 432$	$\times 15$	$\times 2.3$	$\times 2.8$	$\times 2.5$	$1 / 46$
Virt $\times 3$	$\times 𝟏𝟎$	$\times 35 \cdot 10^{3}$	$\times 3.0 \cdot 10^{3}$	$\times 3.2$	$\times 52 \cdot 10^{3}$	$\times 1.6 \cdot 10^{3}$	$\times 3.5$	$\times 4.6$	$\times 4$	$5 / 46$
Flattening	$\times 1.0$	$\times 1.3$	$\times 1.0$	$\times 1.0$	$\times 2.0$	$\times 1.8$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$0 / 46$
EncodeArithmetic	$\times 1.0$	$\times 10$	$\times 3.9$	$\times 1.0$	$\times 2.0$	$\times 1.8$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$0 / 46$
Split ( $k = 10$ )	$\times 𝟏𝟎$	$\times 1.2 \cdot 10^{3}$	$\times 107$	$\times 1.0$	$\times 1.3$	$\times 1.2$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$1 / 46$
Split ( $k = 13$ )	$\times 𝟏𝟎$	$\times 15 \cdot 10^{3}$	$\times 862$	$\times 1.0$	$\times 1.3$	$\times 1.2$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$4 / 46$
For ( $k = 1$ )	$\times 𝟏𝟎$	$\times 476$	$\times 209$	$\times 1.0$	$\times 1.4$	$\times 1.2$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$2 / 46$
For ( $k = 2$ )	$\times 𝟏𝟎$	$\times 33 \cdot 10^{3}$	$\times 3.7 \cdot 10^{3}$	$\times 1.0$	$\times 1.4$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$10 / 46$
For ( $k = 3$ )	$\times 𝟏𝟎$	$\times 1.1 \cdot 10^{6}$	$\times 2.2 \cdot 10^{5}$	$\times 1.0$	$\times 1.4$	$\times 1.3$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$30 / 46$
For ( $k = 4$ )	$\times 𝟏𝟎$	$\times 2.2 \cdot {𝟏𝟎}^{𝟔}$	$\times 5.1 \cdot {𝟏𝟎}^{𝟓}$	$\times 1.0$	$\times 1.4$	$\times 1.3$	$\times 1.0$	$\times 1.0$	$\times 1.0$	$𝟒𝟔 / 𝟒𝟔$
Virt $+$ For ( $k = 2$ )	$\times 𝟏𝟎$	$\times 5.4 \cdot 10^{5}$	$\times 33 \cdot 10^{3}$	$\times 1.0$	$\times 3.8$	$\times 1.2$	$\times 1.5$	$\times 1.6$	$\times 1.6$	$23 / 46$

Table 12. Table 12 : Obfuscations on Dataset #1 — — Secret Finding — DFS heuristics, 1h timeout

	DSE slowdown			Runtime overhead			#TO
Transformation	min	max	avg	min	max	avg
Virt	$\times 1.0$	$\times 4.0$	$\times 1.6$	$\times 1.2$	$\times 1.4$	$\times 1.3$	$0 / 15$
Virt $\times 2$	$\times 6$	$\times 268$	$\times 33$	$\times 1.3$	$\times 6.3$	$\times 2.5$	$0 / 15$
Virt $\times 3$	$\times 557$	$\times 4.7 \cdot 10^{3}$	$\times 1.7 \cdot 10^{3}$	$\times 5.5$	$\times 513$	$\times 126$	$2 / 15$
Flat-Virt	$\times 1.0$	$\times 8.3$	$\times 2.3$	$\times 1.2$	$\times 1.5$	$\times 1.3$	$0 / 15$
Flat-MBA	$\times 2.0$	$\times 878$	$\times 59$	$\times 1.2$	$\times 1.3$	$\times 1.3$	$0 / 15$
Split ( $k = 10$ )	$\times 1.1$	$\times 9$	$\times 6$	$\times 1.0$	$\times 1.3$	$\times 1.2$	$0 / 15$
Split ( $k = 13$ )	$\times 1.1$	$\times 12$	$\times 8$	$\times 1.3$	$\times 1.8$	$\times 1.6$	$0 / 15$
For ( $k = 1$ )	$\times 7$	$\times 1.1 \cdot 10^{3}$	$\times 169$	$\times 1.0$	$\times 1.1$	$\times 1.0$	$0 / 15$
For ( $k = 2$ )	$\times 841$	$\times 1.7 \cdot 10^{5}$	$\times 17 \cdot 10^{3}$	$\times 1.0$	$\times 1.1$	$\times 1.0$	$1 / 15$
For ( $k = 3$ )	$\times 2.3 \cdot 10^{3}$	$\times 3.6 \cdot 10^{5}$	$\times 1.6 \cdot 10^{5}$	$\times 1.0$	$\times 1.1$	$\times 1.0$	$8 / 15$
For ( $k = 4$ )	$\times 2.1 \cdot {𝟏𝟎}^{𝟓}$	$\times 4.2 \cdot {𝟏𝟎}^{𝟓}$	$\times 3.2 \cdot {𝟏𝟎}^{𝟓}$	$\times 1.0$	$\times 1.1$	$\times 1.0$	$𝟏𝟓 / 𝟏𝟓$

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

22affiliationtext: LORIA - Nancy, France11affiliationtext: CEA LIST - Paris-Saclay, France

How to Kill Symbolic Deobfuscation for Free

(or: Unleashing the Potential of Path-Oriented Protections)

Mathilde Ollivier

Sebastien Bardin

Richard Bonichon

Jean-Yves Marion

Abstract

Code obfuscation is a major tool for protecting software intellectual property from attacks such as reverse engineering or code tampering. Yet, recently proposed (automated) attacks based on Dynamic Symbolic Execution (DSE) shows very promising results, hence threatening software integrity. Current defenses are not fully satisfactory, being either not efficient against symbolic reasoning, or affecting runtime performance too much, or being too easy to spot. We present and study a new class of anti-DSE protections coined as path-oriented protections targeting the weakest spot of DSE, namely path exploration. We propose a lightweight, efficient, resistant and analytically proved class of obfuscation algorithms designed to hinder DSE-based attacks. Extensive evaluation demonstrates that these approaches critically counter symbolic deobfuscation while yielding only a very slight overhead.

1 Introduction

Context

Reverse engineering and code tampering are widely used to extract proprietary assets (e.g., algorithms or cryptographic keys) or bypass security checks from software. Code protection techniques precisely seek to prevent, or at least make difficult, such man-at-the-end attacks, where the attacker has total control of the environment running the software under attack. Obfuscation **[20, 19]** aims at hiding a program’s behavior by transforming its executable code in such a way that the behavior is conserved but the program becomes much harder to understand.

Even though obfuscation techniques are quite resilient against basic automatic reverse engineering (including static attacks, e.g. disassembly, and dynamic attacks, e.g. monitoring), code analysis improves quickly **[37]**. Recent attacks based on Dynamic Symbolic Execution (DSE, a.k.a. concolic execution) **[17, 28, 38]** use logical formulas to represent input constraints along an execution path, and then automatically solve these constraints to discover new execution paths. DSE appears to be very efficient against existing obfuscations **[5, 8, 35, 49, 22]**, combining the best of dynamic and semantic analysis.

Problem

The current state of symbolic deobfuscation is actually pretty unclear. Dedicated protections have been proposed, mainly based on hard-to-solve predicates, like Mixed Boolean Arithmetic formulas (MBA) **[50]** or cryptographic hash functions **[40]**. Yet the effect of complexified constraints on automatic solvers is hard to predict **[6]**, while cryptographic hash functions are easy to spot, may induce significant overhead and are amenable to key extraction attacks (possibly by DSE).

On the other hand, DSE has been fruitfully applied on malware and legit codes protected by state-of-the-art tools and methods, including virtualization, self-modification, hashing or MBA **[49, 35, 8]**. The recent systematic experimental evaluation of symbolic deobfuscation by Banescu et al. **[5]** shows that most standard obfuscation techniques do not seriously impact DSE. Only nested virtualization seems to provide a good protection, assuming the defender is ready to pay a high cost in terms of runtime and code size **[35]**.

Goals and Challenges

We want to propose a new class of dedicated anti-DSE obfuscation techniques to render automated attacks based on symbolic execution inefficient. These techniques should be strong – making DSE intractable in practice, and lightweight – with very low overhead in both code size and runtime performance. While most anti-DSE defenses try to break the symbolic reasoning part of DSE (constraint solver), we instead target its real weak spot, namely path exploration. Banescu et al. **[5]** present one such specific obfuscation scheme but with a large space overhead and no experimental evaluation. We aim at proposing a general framework to understand such obfuscations and to define new schemes both strong and lightweight.

Contribution

We study path-oriented protections, a class of protections seeking to hinder DSE by substantially increasing the number of feasible paths within a program.

•

We detail a formal framework describing path-oriented protections (Sec. 4). We characterize their desirable properties — namely tractability, strength, and the key criterion of single value path (SVP). The framework is predictive, in the sense that our classification is confirmed by experimental evaluation (Sec. 8), allowing both to shed new light on the few existing path-oriented protections and to provide guidelines to design new ones. In particular, no existing protection **[5]** achieves both tractability and optimal strength (SVP). As a remedy, we propose the first two obfuscation schemes achieving both tractability and optimal strength (Sec. 5).

•

We highlight the importance of the anchorage policy, i.e. the way to choose where to insert protection in the code, in terms of protection efficiency and robustness. Especially, we identify a way to achieve optimal composition of path-oriented protections (Sec. 6.1), and to completely prevent taint-based and slic-based attacks (two powerful code-level attacks against obfuscation), coined as resistance by design (Sec. 6.2).

•

We conduct extensive experiments (Sec. 8.3) with two different attack scenarios — exhaustive path coverage (Sec. 8.3) and secret finding. Results confirm that path-oriented protections are much stronger against DSE attacks than standard protections (including nested virtualization) for only a slight overhead. Moreover, while existing techniques **[5]** can still be weak in some scenarios (e.g., secret finding), our new optimal schemes cripple symbolic deobfuscation at essentially no cost in any setting. Finally, experiments against slice, pattern-matching and taint attacks confirm the quality of our robust-by-design mechanism.

As a practical outcome, we propose a new hardened deobfuscation benchmark (Sec. 9), currently out-of-reach of symbolic engines, in order to extend existing obfuscation benchmarks **[1, 5, 35]**.

Discussion

We study a powerful class of protections against symbolic deobfuscation, based on a careful analysis of DSE – we target its weakest point (path exploration) when other dedicated methods usually aim at its strongest point (constraint solving and ever-evolving SMT solvers). We propose a predictive framework allowing to understand these protections, as well as several concrete protections impacting DSE more than three levels of virtualization at essentially no cost. We expect them to be also efficient against other semantic attacks **[10, 29]** (cf. Sec. 10). From a methodological point of view, this work extends recent attempts at rigorous evaluation of obfuscation methods. We provide both an analytical evaluation, as Bruni et al. **[14]** for anti-abstract model checking, and a refinement of the experimental setup initiated by Banescu et al. **[5]**.

2 Motivation

2.1 Attacker model

Goal

We consider man-at-the-end scenarios where the attacker has full access to a potentially protected code under attack. The attacker has just the binary code and no access to the source code. The attacker model and the methodology follows closely the survey by Schrittwieser et al. **[37*]**. In order to be more concrete, we will focus on the following (intermediate) goals: *

*Exhaustive Path Exploration. Covering every feasible path of the binary allows the attacker to retrieve a consistent Control Flow Graph and understand what the original program performs.

*Secret Finding. Focusing on a specific part of the code (e.g. license checks) and try to understand or retrieve a secret (e.g. a key).

**

Capacity

we assume an all-powerful symbolic adversary, that is, this adversary can run a correct and complete Dynamic Symbolic Execution (DSE). In practice, symbolic engines are correct — every path discovered is actually feasible — but incomplete — they can be tricked into missing feasible paths **[48]** or the underlying solver may timeout. This adversary can also perform additional code-analysis based attack steps, such as slicing- **[42]**, pattern-matching or tainting-based **[38]** code simplifications.

Caveat

In the remains, we always consider this attacker model. There is one caveat that it is worth mentioning now: part of our experimental evaluations are done from source codes, i.e. in our experiments the attacker has sometimes the source code. The reason is that state-of-the-art source-level DSE tools are much more efficient than binary-level ones, and that there is no good state-of-the-art tools to perform slice or taint attacks on binary codes. Our experimental conditions are much more in favor of the attacker, and as a result they show that our approach is all the more effective.

2.2 Motivating example

Let us illustrate anti-symbolic path-oriented protections on a toy crackme program111https://github.com/trailofbits/manticore. Fig. 1 displays a skeleton of its source code. main calls check to verify each character of the 11 bytes input. It then outputs ”win” for a correct guess, ”lose” otherwise. Each subfunction check_char_i ${}_{\_}{i\in[0,10]}$ hides a secret character value behind bitwise transformations, like xor or shift. Such a challenge can be easily solved, completely automatically, by symbolic execution tools. KLEE **[16]** needs 0.03s (on C code) and Binsec **[24]** 0.3s (on binary code) to both find a winning input and explore all paths.

Standard protections

Let us now protect the program with standard obfuscations to measure their impact on symbolic deobfuscation. We will rely on Tigress **[21]**, a widely used tool for systematic evaluation of deobfuscation methods **[8, 35, 5]**, to apply (nested) virtualization, a most effective obfuscation **[5]**. Yet, Sec. 2.2 clearly shows that virtualization does not prevent KLEE from finding the winning output, though it can thwart path exploration – but with a high runtime overhead (40 $\times$ ).

The case for (new) path-oriented protections

To defend against symbolic attackers, we thus need better anti-DSE obfuscation: path-oriented protections. Such protections aim at exponentially increasing the number of paths that a DSE-based deobfuscation tool, like KLEE, must explore. Two such protections are Split and For, illustrated in Fig. 2 on function check_char_0 of the example.

For the sake of simplicity, the protections are implemented in a naive form, sensitive to slicing or compiler optimizations. Robustness is discussed afterwards. In a nutshell, Split— an instance of Range Divider **[5*]** — adds a number $k$ of conditional statements depending on new fresh inputs, increasing the number of paths to explore by a factor of $2^{k}$ . Also, in this implementation we use a junk variable garb and two additional inputs ch1 and ch2 unrelated to the original code. The novel obfuscation For (Sec. 5) adds $k$ loops whose upper bound depends on distinct input bytes and which recompute a value that will be used later, expanding the number of paths to explore by a factor of $2^{8\cdot k}$ – assuming a 8-bit char type. This implementation does not introduce any junk variable nor additional input. In both cases, the obfuscated code relies on the input, forcing DSE to explore a priori all paths. Sec. 2.2 summarizes the performance of Split and For. *

• Both Split and For do not induce any overhead, * • Splitis highly efficient (timeout) against coverage but not against secret finding, while For is highly efficient for both. * • For*( $k=2$ ) performs already better than Split ( $k=19$ ) and further experiments (Sec. 8) shows For to be a much more effective path protection than Split. * **

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Tigress challenge. http://tigress.cs.arizona.edu/challenges.html.
2[2] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen, W. Grieskamp, M. Harman, M. J. Harrold, and P. Mc Minn. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software , 2013.
3[3] Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley. Enhancing symbolic execution with veritesting. Commun. ACM , 59(6), 2016.
4[4] Gogul Balakrishnan and Thomas W. Reps. WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst. , 32, 2010.
5[5] Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. Code obfuscation against symbolic execution attacks. In Annual Conference on Computer Security Applications, ACSAC 2016 , 2016.
6[6] Sebastian Banescu, Christian S. Collberg, and Alexander Pretschner. Predicting the resilience of obfuscated code against symbolic execution attacks via machine learning. In USENIX Security Symposium , 2017.
7[7] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. In Advances in Cryptology - CRYPTO , 2001.
8[8] Sébastien Bardin, Robin David, and Jean-Yves Marion. Backward-bounded DSE: targeting infeasibility questions on obfuscated codes. In 2017 IEEE Symposium on Security and Privacy, SP , 2017.