SLING: Using Dynamic Analysis to Infer Program Invariants in Separation   Logic

Ton Chanh Le; Guolong Zheng; and ThanhVu Nguyen

arXiv:1903.09713·cs.PL·July 2, 2019

SLING: Using Dynamic Analysis to Infer Program Invariants in Separation Logic

Ton Chanh Le, Guolong Zheng, and ThanhVu Nguyen

PDF

TL;DR

SLING is a dynamic analysis tool that infers separation logic invariants for heap-manipulating programs by analyzing execution traces and memory regions, aiding in understanding program behavior.

Contribution

Introduces SLING, a novel dynamic analysis method that automatically generates separation logic invariants from execution traces of C programs.

Findings

01

Efficiently generates correct invariants for complex data structures

02

Works at arbitrary program locations like preconditions and loop invariants

03

Preliminary results show effectiveness on existing benchmarks

Abstract

We introduce a new dynamic analysis technique to discover invariants in separation logic for heap-manipulating programs. First, we use a debugger to obtain rich program execution traces at locations of interest on sample inputs. These traces consist of heap and stack information of variables that point to dynamically allocated data structures. Next, we iteratively analyze separate memory regions related to each pointer variable and search for a formula over predefined heap predicates in separation logic to model these regions. Finally, we combine the computed formulae into an invariant that describes the shape of explored memory regions. We present SLING, a tool that implements these ideas to automatically generate invariants in separation logic at arbitrary locations in C programs, e.g., program pre and postconditions and loop invariants. Preliminary results on existing benchmarks…

Tables2

Table 1. Table 1. Experimental results. Programs denoted with ∗ contain bugs preventing us to obtain traces. Programs denoted with † cause SLING to timeout at certain locations. Italic programs have locations that cannot be reached using random inputs. Bold programs contain locations with free statements that give invalid traces. SLL and DLL stand for Singly and Doubly Linked Lists, respectively.

Programs

Total

Avg. Per Inv

LoC

iLocs

Traces

Invs

A/S/X

Time(s)

Single

Pred

Pure

SLL (8): append, delAll, find, insert, reverse, insertFront, insertBack, copy

168

26

226

30

8/0/0

40.54

0.37

0.83

1.03

Sorted List (10): concat, find, findLast, insert, insertIter, delAll,

reverseSort, insertionSort, mergeSort, quickSort^∗

268

25

194

82

9/0/1

137.32

0.39

2.40

0.67

DLL (12): append, concat, meld, delAll, insertBack, insertFront, midInsert,

midDel, midDelError, midDelHd, midDelStar, midDelMid

160

31

168

238

12/0/0

399.12

0.46

1.68

3.93

Circular List (4): insertFront, insertBack, delFront, delBack

97

11

14

42(16)

2/2/0

11.43

0.81

1.19

2.10

Binary Search Tree (5): del, findIter, find, insert, rmRoot^∗

144

16

66

24

2/2/1

24.02

0.50

1.21

1.54

AVL Tree (4): avlBalance, del, findSmallest, insert

194

13

56

37

2/2/0

22.12

1.22

0.57

3.08

Priority Tree (4): del, find, insert, rmRoot

154

19

64

273

2/2/0

341.37

3.30

1.66

3.30

Red-black Tree (2): del^∗, insert

287

11

70

63

0/1/1

44.8

2.10

1.08

8.11

Tree Traversal (5): traverseInorder, traversePostorder, traversePreorder,

tree2list, tree2listIter^∗

168

12

174

12

4/0/1

22.93

0.08

0.58

0.50

glib/glist_DLL (10): find, free, index, last,

length, nth, nthData, position, prepend, reverse

216

31

128

435(20)

9/1/0

403.13

0

2.61

7.29

glib/glist_SLL (22): append, concat, copy, delLink, find, free, index,

insertAtPos, insertBefore, insertSorted, last, length, nth, nthData,

position, prepend, rm, rmAll, rmLink, reverse, sortMerge, sortReal

606

69

299

382(11)

17/5/0

879.35

0.56

2.28

2.07

OpenBSD Queue (6): init, insertAfter, insertHd, insertTl, rmAfter, rmHd

105

12

27(4)

4/2/0

10.04

0.15

2.04

0.15

Memory Region (1): memRegionDllOps

67

7

14

52

1/0/0

17.70

0.73

0.81

7.96

Binomial Heap (2): findMin, merge

117

8

54

89

0/2/0

76.56

1.39

0.90

9.15

SV-COMP (Heap Programs) (7): allocSlave, insertSlave,

createSlave, destroySlave, add, del, init

119

16

34

71

7/0/0

58.17

0.24

1.66

3.41

GRASShopper_SLL (Iterative) (8): concat, copy, dispose, filter, insert, rm,

reverse, traverse

193

27

111

98(9)

6/2/0

71.03

0.17

2.72

1.15

GRASShopper_SLL (Recursive) (8): concat, copy, dispose, filter, insert, rm,

reverse, traverse

173

24

118

40(3)

6/2/0

30.94

0.28

2.00

1.1

GRASShopper_DLL (8): concat, copy, dispose, filter^†, insert, rm,

reverse, traverse

209

24

108

638(20)

5/2/1

803.58

0.04

2.95

8.50

GRASShopper_SortedList (14): concat, copy, dispose, filter, insert, reverse,

rm, split, traverse, merge, doubleAll, pairwiseSum, insertionSort^†, mergeSort^∗

394

43

195

222(1)

10/2/2

160.1

1.04

2.27

4.29

AFWP_SLL (11): create, delAll, find, last, reverse, rotate, swap, insert, del^†,

filter, merge

264

25

89

94(11)

7/3/1

71.04

0.18

1.73

1.85

AFWP_DLL (2): dll_fix, dll_splice

40

5

16

133

2/0/0

75.51

0.02

2.96

6.67

Cyclist (4): aplas-stack, composite4, iter, schorr-waite

506

32

360

132

1/3/0

165.26

0.27

0.63

0.67

Table 2. Table 2. Comparing SLING to S2.

Programs	Total	Both	S2	SLING	Neither
SLL	9	8	0	1	0
Sorted List	14	6	0	6	2
DLL	13	0	0	13	0
Circular List	4	0	0	2	2
Binary Search Tree	6	1	1	2	2
AVL Tree	4	0	0	2	2
Priority Tree	4	1	1	1	1
Red-black Tree	2	0	0	0	2
Tree Traversal	6	3	0	2	1
glib/glist_DLL	19	0	0	18	1
glib/glist_SLL	40	6	0	29	5
OpenBSD Queue	6	0	0	4	2
Memory Region	3	1	0	2	0
Binomial Heap	2	0	1	0	1
SV-COMP	9	0	0	9	0
GRASShopper_SLL (Iter)	16	2	0	12	2
GRASShopper_SLL (Rec)	8	3	2	3	0
GRASShopper_DLL	16	0	0	13	3
GRASShopper_SortedList	29	1	0	24	4
AFWP_SLL	20	1	0	15	4
AFWP_DLL	3	0	0	3	0
Cyclist	4	0	0	1	3
Total Sum	237	33	5	162	37

Equations17

pre =

pre =

post =

\exists p, u, v . dll (x, p, u, y) * dll (y, u, v, nil) \land res = x

F_{L 3} = \exists u_{1}, u_{2}, u_{3}, u_{4}, u_{5}, tmp .

F_{L 3} = \exists u_{1}, u_{2}, u_{3}, u_{4}, u_{5}, tmp .

dll (x, u_{1}, u_{2}, tmp) * dll (tmp, x, u_{3}, y) * dll (y, u_{4}, u_{5}, nil) .

F_{L 3}^{'} = \exists u_{1}, u_{3}, u_{5}, tmp . dll (x, u_{1}, x, tmp) *

F_{L 3}^{'} = \exists u_{1}, u_{3}, u_{5}, tmp . dll (x, u_{1}, x, tmp) *

dll (tmp, x, u_{3}, y) * dll (y, u_{3}, u_{5}, nil) \land res = x .

F_{L 1}^{'} =

F_{L 1}^{'} =

dll (x, u_{1}, u_{2}, nil) * dll (y, u_{3}, u_{4}, nil) \land u_{3} = nil, and

F_{L 2}^{'} =

dll (y, u_{1}, u_{2}, nil) \land u_{1} = nil \land x = nil \land res = y .

{R * P} C {R * Q} {P} C {Q}

{R * P} C {R * Q} {P} C {Q}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

SLING: Using Dynamic Analysis to Infer Program Invariants in Separation Logic

Ton Chanh Le

Stevens Institute of TechnologyHobokenNew JerseyUSA

[email protected]

,

Guolong Zheng

University of Nebraska-LincolnLincolnNebraskaUSA

[email protected]

and

ThanhVu Nguyen

University of Nebraska-LincolnLincolnNebraskaUSA

[email protected]

(2019)

Abstract.

We introduce a new dynamic analysis technique to discover invariants in separation logic for heap-manipulating programs. First, we use a debugger to obtain rich program execution traces at locations of interest on sample inputs. These traces consist of heap and stack information of variables that point to dynamically allocated data structures. Next, we iteratively analyze separate memory regions related to each pointer variable and search for a formula over predefined heap predicates in separation logic to model these regions. Finally, we combine the computed formulae into an invariant that describes the shape of explored memory regions.

We present SLING, a tool that implements these ideas to automatically generate invariants in separation logic at arbitrary locations in C programs, e.g., program pre and postconditions and loop invariants. Preliminary results on existing benchmarks show that SLING can efficiently generate correct and useful invariants for programs that manipulate a wide variety of complex data structures.

dynamic invariant analysis, separation logic

††copyright: none††price: 15.00††doi: 10.1145/3314221.3314634††journalyear: 2019††isbn: 978-1-4503-6712-7/19/06††conference: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation; June 22–26, 2019; Phoenix, AZ, USA††booktitle: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’19), June 22–26, 2019, Phoenix, AZ, USA††ccs: Theory of computation Separation logic††ccs: Theory of computation Invariants††ccs: Theory of computation Pre- and post-conditions††ccs: Theory of computation Program analysis††ccs: Software and its engineering Dynamic analysis††ccs: Software and its engineering Software verification

1. Introduction

A program invariant is a property that holds whenever program execution reaches a specific location. For example, a loop invariant can indicate a relation among the program variables at the loop entrance. Invariants help prove program correctness, e.g., classical verification approaches by Floyd-Hoare and Dijkstra (Hoare, 1969; Dijkstra, 1975) can be automated when given needed loop invariants and the infamous Heartbleed bug can be avoided by preserving an invariant capturing the proper size of the received payload message (Fava et al., 2015). Invariants also help developers understand programs, e.g., showing interesting or unexpected behaviors, and even discover non-functional bugs, e.g., revealing that the program has an unusual high runtime complexity (Nguyen et al., 2017). Invariants are also useful in other programming tasks, including documentation, maintenance, code optimization, fault localization, program repair, and security analysis (Leroy, 2006; Ball and Rajamani, 2002; Flanagan and Leino, 2001; Ernst et al., 2007; B Le et al., 2016; Perkins et al., 2009; Nguyen et al., 2017).

Unfortunately, software developers appear to perceive a “specification burden” (Ball and Rajamani, 2002) which leads them to eschew the writing of invariants in favor of executable code. For the past decade, researchers have been chipping away at this challenge of automatic invariant generation using static or dynamic analyses. A static analysis can reason about all program paths soundly, but doing so is expensive and is only possible to relatively small programs or simple forms of invariants, e.g., simple list structures (Berdine et al., 2004; Cook et al., 2011; Navarro Pérez and Rybalchenko, 2011). In contrast, dynamic analysis focuses on program traces observed from running the program on small sample inputs, and thus provides no correctness guarantee on generated invariants. However, dynamic analysis is generally efficient and can infer expressive invariants because it only analyzes a finite, typically small, set of traces.

Existing invariant techniques often focus on invariants over scalar variables, e.g., relations among numerical values (Padhi et al., 2016; Garg et al., 2016; Nguyen et al., 2017). However, modern programs construct and manipulate data structures, i.e., highly-structured sets of memory locations within which these scalar values are stored. Examples of such data structures are dynamically-allocated objects, e.g., heap-based objects created via the new keyword, standard data structures, e.g., lists and trees, or customized and user-defined structures that extend the standard ones and contain other structures internally. Understanding and reasoning about these heap-based programs are more challenging, e.g., even the task of accessing a variable requires checking if it points to a valid memory region (to avoid null pointer dereferencing).

An emerging approach to analyzing heap programs is to use invariants written in separation logic (SL) to represent memory structures (Reynolds, 2002; O’Hearn et al., 2001). SL extends classical logic and allows for compact and precise representations of program semantics and reasoning to be localized to small portions of memory. In the last decade, research in SL has grown rapidly and led to practical techniques used in tools such as the Facebook Infer (FBInfer) analyzer (FBInfer, 2018).

Most existing SL works focus on static analyses to obtain sound results, and therefore can only consider simple classes of invariants or programs, e.g., to support the goal of “move fast to fix more things” (O’Hearn, 2016; Calcagno et al., 2015), FBInfer only considers simple data structures and restricts supported language features. Moreover, while many static analyzers, including FBInfer, compute SL invariants internally to verify programs, we are aware of only a few researchers who have investigated reifying those invariants for consumption by developers, and even then only for a restricted language of list manipulating (Magill et al., 2006) or tree traversing programs (Brockschmidt et al., 2017). Also, most static SL tools aim to infer sufficiently strong invariants to achieve a specific goal, e.g., to prove memory safety or (programmer-provided) postconditions, and thus are not well suited for discovering useful invariants to help understand code that lacks such formal specifications.

In this work, we introduce SLING (Separation Logic Invariant Generation), a tool that dynamically discovers SL invariants for heap programs. SLING takes as inputs a program, a location of interest, a set of predefined predicates defining data structures, and a set of sample inputs. SLING next runs the program on the inputs and uses a debugger to obtain traces capturing memory information of the variables at the considered location. These traces consist of the contents of the stack and heap of the program. SLING then iteratively analyzes variables using these traces to compute invariants. For each pointer variable, SLING generates SL formulae using predefined predicates to model the traces describing memory regions related to the variable. SLING also propagates computed information to improve the analysis of other variables in subsequent iterations. Finally, SLING combines the obtained formulae into a final invariant that represents the explored memory regions.

We use SLING to infer invariants for 157 C programs taken from two existing benchmarks (VCDryad, 2018) and (Brotherston and Gorogiannis, 2014). These programs implement basic algorithms over standard data structures (e.g., singly-linked, doubly-linked, circular lists, binary trees, AVL, red-black trees, heaps, queues, stacks, iterators) and complex functions from open source libraries and the Linux kernel that manipulate customized data structures. Preliminary results show that SLING can efficiently generate invariants that are correct and more precise than the documented invariants and specifications in these programs. Even when given incomplete traces, the tool can still discover partial invariants that are useful for users. We also show that SLING’s invariants can help reason about nontrivial bugs and reveal false positives in modern SL static analyzers. We believe that SLING strikes a practical balance between correctness and expressive power, allowing it to discover complex, yet interesting and useful invariants out of the reach of the current state of the art.

2. Illustration

We describe SLING using the function concat shown in Figure 1. This function recursively concatenates two doubly linked lists x and y and returns (i) y when x is empty or (ii) a new doubly linked list by appending y to the tail of x.

Although simple, concat requires several subtle preconditions over its inputs to work properly. First, x must be a $\mathord{\mathsf{nil}}$ -terminated list, i.e., the next field of its tail node is NULL, otherwise concat may not terminate when x contains a cycle or may refer to an unallocated memory region when the next field of x’s tail node is a dangling pointer. Second, x and y must be non-overlapping, i.e., point to lists in separate memory regions, otherwise the resulting list contains a cycle. These conditions can be difficult to analyze or even to specify because they involve dynamically-allocated data structures and their separations in memory. SLING aims to automatically discover such preconditions at program entrances and, more generally, invariant properties at arbitrary program locations, including postconditions and loop invariants.

2.1. Heap Predicates

SLING infers invariants expressed as formulae in separation logic (SL) to describe properties of heap-manipulating programs. Comparing to existing works for heap programs (Jones and Muchnick, 1982; Sagiv et al., 2002), SL provides concise and expressive syntax and semantics to describe memory (shape) information (Reynolds, 2002; O’Hearn et al., 2001).

To analyze heap programs, SL works often use inductive heap predicates to compactly represent recursively-defined data structures. For concat, we use the predicate $\mathsf{\mathsf{dll}}$ to define doubly linked lists:

[TABLE]

The parameters $hd,tl,pr$ , and $nx$ point to the list’s head, tail, previous, and next element, respectively. The definition of $\mathsf{\mathsf{dll}}$ uses the built-in predicate $\mathord{\mathsf{emp}}$ to represent an empty heap, e.g., a NULL list, and the singleton predicate ${x}{\overset{\mkern-2.0mu\texttt{Node}}{\mapsto}}{nx,pr}$ to denote a memory cell that a variable $x$ of type Node shown in Figure 1 points to ( $nx$ and $pr$ correspond to the next and prev fields of $x$ , respectively111When the context is clear, we simply use ${x}{\overset{\mkern-2.0mu}{\mapsto}}{nx,pr}$ for ${x}{\overset{\mkern-2.0mu\texttt{Node}}{\mapsto}}{nx,pr}$ .).

Conceptually, $\mathsf{\mathsf{dll}}$ states that a doubly linked list is either an empty or a non-empty list, which is recursively defined by having the head $hd$ point to a doubly linked list whose head node is $u$ . In the latter case, the separating conjunction connector $*$ specifies the separation of memory regions modeled by $hd$ ’s singleton predicate and $u$ ’s $\mathsf{\mathsf{dll}}$ predicate, i.e., the heaplets of $hd$ and $u$ are disjoint.

SLING uses such heap predicates to discover invariants and specifications of heap programs. For concat, SLING uses $\mathsf{\mathsf{dll}}$ to generate the precondition on line 6 and the postcondition on line 8 and 14 in Figure 1 as

[TABLE]

These pre and postconditions form a valid specification for concat. The precondition requires that inputs x and y point to two disjoint, $\mathord{\mathsf{nil}}$ -terminated doubly linked lists. Note that unlike y, the $\mathsf{\mathsf{dll}}$ of x shows that it can take any arbitrary previous pointer, i.e., the existential argument $p$ , because this pointer changes across the recursive call on line 11.

The postcondition ensures two exit conditions: (i) when x is empty, the return value $\mathord{\texttt{res}}$ is y, and (ii) otherwise, $\mathord{\texttt{res}}=\texttt{x}$ is the result of appending y to x by changing the next element of x’s predicate from $\mathord{\mathsf{nil}}$ to y. Also, note that the previous field of y now points to the tail element of x. Lastly, the postcondition states that the separation of the heaps of x and y is preserved, i.e., concat only changes the field values of the lists and does not alter the allocated memory.

Inductive heap predicates such as $\mathsf{\mathsf{dll}}$ are standard in SL (e.g., provided by the users or predefined in an analyzer (Jacobs et al., 2011; Magill et al., 2008; Brotherston et al., 2016; Le et al., 2014)) and compactly capture crucial shape properties (e.g., doubly linked lists are acyclic). In addition, compared to normal, non-SL predicates such as isOdd or $x\geq y$ , checking SL heap predicates is nontrivial because we have to “unfold” data structures recursively and find concrete values to instantiate the existential quantifiers, e.g., the pre and postconditions above require finding the correct quantified variables and parameters in $\mathsf{\mathsf{dll}}$ .

2.2. Traces

Given a program annotated with locations of interest, SLING runs the program on sample inputs to collect execution traces. Currently, we use the LLDB debugger (LLDB, 2018) to observe execution traces containing memory addresses and values of the variables in scope at considered locations222These ”traces”, which are snapshots of program states, are often referred to as the stack-heap models in SL literature, which we review in Section 3..

Figure 2a shows two doubly linked lists x and y of size 3 and 2, respectively. When running on these inputs, SLING records traces such as those given in Figures 2. Figure 2a shows the trace obtained in the first iteration of concat at $L1$ . The trace contains information about both the stack, containing values of variables accessible at this location, e.g., $\texttt{x=0x01},\texttt{y=0x04}$ , and the heap, containing allocated memory cells reachable from the stack’s variables, e.g., $\texttt{0x01}{\mapsto}\texttt{Node\{next:0x02;prev:nil\}}$ .

Figure 2b shows three set of traces collected at $L3$ for the first three iterations of concat. Note that the values of x and tmp are different in the stacks because x and tmp change across the recursive calls in concat. However, the heap is similar because concat does not change the heap, e.g., delete or create cells, and all memory cells are still reachable from the stack variables. Moreover, the stack at $L3$ contains a ghost variable $\mathord{\texttt{res}}$ , which stores the return value of the function333This value is captured when the LLDB debugger steps out of the function and jumps back to its call site..

2.3. Inference

SLING infers invariants consisting of SL predicates such as $\mathsf{\mathsf{dll}}$ over variables at a location of interest. For each (pointer) variable, SLING explores relevant memory regions in observed traces to compute invariants for that variable. Finally, SLING combines the computed invariants to model the whole explored memory regions.

Postcondition

We now show how SLING computes the postcondition at $L3$ using the predicate $\mathsf{\mathsf{dll}}$ and the traces shown in Figure 2b. For demonstration we assume that SLING analyzes the variables at $L3$ in the order $\texttt{x},\texttt{tmp},\texttt{y},\mathord{\texttt{res}}$ .

From given traces, SLING first computes the sub-heaps of x and their boundaries. The sub-heaps contain memory cells reachable from x but not pointed to by other stack variables. The boundaries of x’s sub-heaps contain x itself, the $\mathord{\mathsf{nil}}$ pointer if it is reachable from x, and all variables reachable from x or its aliasing pointers. Next, SLING takes the intersection of the boundaries to obtain the common boundary, which consists of variables used to compute invariants in the next step. Figure 3 shows the computed sub-heaps $h_{1}^{\prime}\,{=}\,\{\texttt{0x01}\}$ , $h_{2}^{\prime}\,{=}\,\{\texttt{0x01},\texttt{0x02}\}$ , and $h_{3}^{\prime}\,{=}\,\{\texttt{0x01},\texttt{0x02},\texttt{0x03}\}$ over the three traces of x and their respective boundaries $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ , $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ , and $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp},\texttt{y}\}$ . Their common boundary is $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ .

From computed sub-heaps and boundary variables, SLING searches the predefined predicates for formulae that are consistent with sub-heap traces using boundary variables. For each predicate, SLING creates candidate formulae by instantiating predicate parameters with boundary variables. It does so by enumerating different subsets of boundary variables as predicate parameters. For subsets of size smaller than the number of parameters, SLING introduces fresh existential variables to instantiate the predicate. In our example, SLING enumerates formulae such as $\exists u_{1}.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},\mathord{\mathsf{nil}},\texttt{tmp},u_{1})},\exists u_{1}.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},\mathord{\mathsf{nil}},u_{1},\texttt{tmp})},\dots$ .

Next, SLING then uses an SMT-based model checker to check each candidate formula against the given sub-heaps. The checker either refutes the formula, which is then discarded, or accepts it, which SLING then considers as a valid formula over the sub-heaps. Intuitively, accepted formulae represent partial invariants computed from memory regions, e.g., the sub-heaps, related to the analyzed variable.

In our example, among the generated candidates, the checker accepts the formula $F_{x}=\exists u_{1},u_{2}.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},u_{1},u_{2},\texttt{tmp})}$ . This formula shows that x is a doubly linked list to the next pointer tmp. The existential variables $u_{1},u_{2}$ indicate that SLING cannot find concrete stack or $\mathord{\mathsf{nil}}$ variables for the second and third parameters of $\mathsf{\mathsf{dll}}$ from the traces.

Although $F_{x}$ holds over the given sub-heaps, it might not generalize to the entire heap in the observed traces. Thus, when analyzing $F_{x}$ , the checker also computes a residual heap, which represents the part of the heap that is not modeled by $F_{x}$ . The checker also computes a mapping from existential variables to concrete memory addresses from given traces. SLING propagates these details to improve the analyses of other variables in subsequent iterations.

SLING now continues with the other variables $\texttt{tmp},\texttt{y},\mathord{\texttt{res}}$ using the described steps and computed information (residual heaps and address mappings). For tmp, SLING computes the sub-heaps and boundary $\{\texttt{tmp},\texttt{x},\mathord{\texttt{res}},\texttt{y}\}$ , and obtains then the formula $F_{\texttt{tmp}}=\exists u_{3}.\,{\mathsf{\mathsf{dll}}}{(\texttt{tmp},\texttt{x},u_{3},\texttt{y})}$ , which indicates a doubly linked list from tmp to the next pointer y. Also, observe that the previous pointer points to x, showing the connection between this list and the one modeled by $F_{x}$ .

Similarly, for the last two variables y and res, SLING obtains the formulae $F_{y}=\exists u_{4},u_{5}.\,{\mathsf{\mathsf{dll}}}{(\texttt{y},u_{4},u_{5},\mathord{\mathsf{nil}})}$ and $F_{\texttt{res}}=\mathord{\mathsf{emp}}$ . $F_{\texttt{res}}$ is $\mathord{\mathsf{emp}}$ because every sub-heap reachable from $\mathord{\texttt{res}}$ is empty in the traces observed in the last iteration.

The obtained formulae $F_{\texttt{x}},F_{\texttt{y}},F_{\texttt{tmp}}$ , and $F_{\texttt{res}}$ model separate sub-heaps, thus SLING combines them using the $*$ operator to form a shape invariant capturing the shape of the memory at $L3$ , e.g., connections among separate heaplets:

[TABLE]

Note that the constraint $F_{\texttt{res}}=\mathord{\mathsf{emp}}$ is discarded from the conjunction. Also note that the local variable tmp is not in the scope of the function’s exit, thus SLING considers it as an existential variable in $F_{L3}$ . In general, SLING only uses the function’s parameters and the ghost variable $\mathord{\texttt{res}}$ as free variables in the function’s pre and postconditions.

SLING also examines analyzed information to find additional pure (not related to memory) relations among the stack and existential variables in the inferred formula. In this example, SLING determines that $\texttt{res}=\texttt{x}$ , indicating that the return value at $L3$ is x. It also infers aliasing information such as $\texttt{x}\,{=}\,u_{2}$ , $u_{3}\,{=}\,u_{4}$ from the address mapping. From these additional equalities, SLING derives the final result:

[TABLE]

This result $F^{\prime}_{L3}$ is correct at $L3$ and even more precise than (stronger) the postcondition shown in Section 2.1 when $\texttt{x}\neq\mathord{\mathsf{nil}}$ : $\exists p,u,v.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},p,u,\texttt{y})}*{\mathsf{\mathsf{dll}}}{(\texttt{y},u,v,\mathord{\mathsf{nil}})}\wedge\mathord{\texttt{res}}\,{=}\,\texttt{x}$ . The reason is because ${\mathsf{\mathsf{dll}}}{(\texttt{x},u_{1},\texttt{x},\texttt{tmp})}*{\mathsf{\mathsf{dll}}}{(\texttt{tmp},\texttt{x},u_{3},\texttt{y})}$ in $F^{\prime}_{L3}$ entails $\exists p,u.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},p,u,\texttt{y})}$ in the given postcondition. The reversed direction of this entailment does not hold as it requires a non-trivial condition $\texttt{x}\,{\neq}\,\texttt{y}$ .

Precondition and Other Invariants

Using the same inference process over the traces obtained from the input x, y given in Figure 2, SLING infers the precondition at location $L1$ and the invariant at location $L2$ of concat as

[TABLE]

These are the pre and postconditions shown in Section 2.1. From these results, we obtain the specification of concat because the complete post condition is the disjunction $F^{\prime}_{L2}\vee F^{\prime}_{L3}$ . In general, SLING can compute invariants at arbitrary program locations by applying the described inference process to traces obtained at those locations.

Depending on the program, we could create scenarios where different orders of analyzed variables produce weaker results. This is because the propagated residue information affects the computation of boundary variables and thus the instantiations of parameters in the predicates. SLING prevents these scenarios by using a simple heuristic that only selects the next variables to analyze from those directly reachable from previously considered variables. This gives a fixed order that works well in our experiments (Section 5.3).

3. Separation Logic

SL (Jones and Muchnick, 1982; Sagiv et al., 2002) has been actively used to reason about imperative programs that manipulate data structures. Crucially, SL uses the separating conjunction operation $*$ to describe the separation of computer memory, i.e., the assertion $p*q$ states that $p$ and $q$ hold for disjoint memory regions. Moreover, SL is often equipped with the ability for users to define inductive heap predicates (such as the predicate $\mathsf{\mathsf{dll}}$ used in Section 2 for doubly linked list). The combination of the $*$ operator and heap predicates make SL expressive enough to model various types of data structures.

Figure 4 shows the syntax and semantics of the SL formulae we consider in this work. These represent the standard symbolic-heap fragment of SL (Brotherston et al., 2016; Ta et al., 2016, 2018) with user-defined inductive heap predicates.

Syntax

We denote $x$ as a variable, $k,e$ as an integer constant and an integer expression, respectively, $\mathord{\mathsf{nil}}$ as a constant denoting a dangling memory address (null), and $a$ as a spatial expression modeling a memory address. The predicate $\mathord{\mathsf{emp}}$ models an empty heap, the singleton heap predicate ${x}{\overset{\mkern-2.0mu\tau}{\mapsto}}{t_{1},...,t_{n}}$ models an $n$ -field data structure type $\tau$ where $x$ points to, and the inductive heap predicate ${\mathord{\mathsf{p}}}{(t_{1},...,t_{n})}$ models a recursively defined data structure. The spatial formulae $\Sigma$ consist of these predicates and their compositions using the $*$ separating conjunction operator. $\Pi$ denotes pure formulae in linear arithmetic, which do not contain any predicates. Note that we can negate the presented formulae to obtain formulae involving disjunctions, universal quantifiers, and other comparison relations.

Semantics

Given a set Var of variables, Type of types, Val of values, and $\mathsf{\texttt{Loc}}\,{\subset}\,\mathsf{\texttt{Val}}$ of memory addresses, an SL stack-heap model, i.e., concrete trace, is a pair of a stack model $s$ , which is a function $s{:}~{}\mathsf{\texttt{Var}}\rightarrow\mathsf{\texttt{Val}}$ , and a heap model $h$ , which is a partial function $h{:}~{}\mathsf{\texttt{Loc}}\rightharpoonup(\mathsf{\texttt{Type}}\times\mathord{\texttt{Val}}^{*})$ . We write $\llbracket{\Pi}\rrbracket_{s}$ to denote the valuation of a formula $\Pi$ under the stack model $s$ and ${s,h}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}{F}$ to denote a model $s,h$ satisfies a formula $F$ . Moreover, ${\mathrm{dom}(h)}$ denotes the domain of $h$ , $h_{1}\,{\mathrel{\#}}\,h_{2}$ denotes $h_{1}$ and $h_{2}$ have different domains, and $h_{1}\,{\circ}\,h_{2}$ denotes the union of two disjoint heaps $h_{1}$ and $h_{2}$ , and $[{f}\,|\,x\,{:}\,\,y]$ indicates a function like $f$ but returns $y$ for input $x$ . We also define the heap union and difference operators over two sequences of stack-heap models as $(s_{i},h_{i})_{i=1}^{n}\oplus(s_{i},h^{\prime}_{i})_{i=1}^{n}\mathrel{\stackrel{{\scriptstyle\makebox[0.0pt]{\mbox{\tiny def}}}}{{=}}}(s_{i},h_{i}\circ h^{\prime}_{i})_{i=1}^{n}$ and $(s_{i},h_{i})_{i=1}^{n}\setminus(s_{i},h^{\prime}_{i})_{i=1}^{n}\mathrel{\stackrel{{\scriptstyle\makebox[0.0pt]{\mbox{\tiny def}}}}{{=}}}(s_{i},h_{i}\setminus h^{\prime}_{i})_{i=1}^{n}$ , respectively.

Model Checking

We follow the technique given in (Brotherston et al., 2016) to implement a model checker, which checks if a formula $F$ is satisfied by a stack-heap model $s,h$ and returns a residual heap $h^{\prime}$ , i.e., memory regions in $h$ not modeled by $F$ , and an instantiation $\iota$ that maps existential variables in $F$ to concrete addresses in the model. These checking and instantiation tasks are encoded as logical formulae solvable using the Z3 SMT solver (De Moura and Bjørner, 2008).

Note that the model checking technique proposed in (Brotherston et al., 2016) does not return the instantiation $\iota$ , which is needed by SLING to compute equalities among variables in $F$ . To obtain $\iota$ , we slightly redefine the problem with a new satisfaction relation:

Definition 0 (Satisfaction Relation with Existential Instantiation).

The relation ${s,h}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}_{\iota}{F}$ is the satisfaction relation ${s,h}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}{F}$ except that the value of an existential variable in $F$ is obtained from the instantiation $\iota$ , which is a function from Var to Val similar to the stack model.

We also lift this relation to sequences of stack-heap models and instantiations as ${(s_{i},h_{i})_{i=1}^{n}}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}_{(\iota_{i})_{i=1}^{n}}{F}\mathrel{\stackrel{{\scriptstyle\makebox[0.0pt]{\mbox{\tiny def}}}}{{=}}}\forall i.\;{s_{i},h_{i}}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}_{\iota_{i}}{F}$ .

Definition 0 (Symbolic-heap Model Checking).

A reduction ${s,h}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ \parallel $}}\mkern-3.2mu\raisebox{1.00006pt}{\scalebox{0.9}[1.0]{$ - $}}}{F}\rightsquigarrow{h^{\prime},\iota}$ is valid if $h^{\prime}\subseteq h$ and ${s,h\setminus h^{\prime}}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}_{\iota}{F}$ .

Definition 3.2 redefines the model checking reduction relation to return, in addition to the residual heap model $h^{\prime}$ , an instantiation $\iota$ of existential variables in $F$ that satisfies the relation ${s,h\setminus h^{\prime}}\mathrel{\raisebox{1.00006pt}{\scalebox{1.0}[0.85]{$ | $}}\mkern-2.5mu\mkern-1.0mu\raisebox{0.80002pt}{\scalebox{0.95}[1.0]{$ = $}}}_{\iota}{F}$ in Definition 3.1.

4. The SLING Algorithm

Algorithm 1 shows the implementation of SLING. Given a program $\mathsf{C}$ , a set $\mathsf{P}$ of predefined inductive heap predicates, a target location $l$ in $\mathsf{C}$ , and a test suite $\mathsf{T}$ , SLING returns a set $\mathsf{R}$ of SL formulae satisfied by observed traces at $l$ .

SLING infers invariants using the three main phases described below. In the following we use the term stack-heap models to refer to concrete traces.

Model Collection (line 1)

SLING first calls $\mathsf{CollectModels}$ to collect all stack-heap models observed at location $l$ when running the program over the tests in $\mathsf{T}$ . $\mathsf{CollectModels}$ uses a software debugger such as LLDB to set a breakpoint at $l$ and inspect the memory layout when executing the program. It then collects the set of stack-heap models $\mathsf{SH}$ from the memory whenever hitting the breakpoint at $l$ .

Inference (lines 2–11)

After obtaining the stack-heap models at $l$ , SLING performs a heap inference and then a pure inference to derive a set of results satisfied by the models. SLING uses an iterative refinement process over the stack variables to infer invariants. At each iteration, SLING updates the result set $\mathsf{R}$ with a tuple $(F,\mathsf{SH},\mathsf{I})$ , where the formula $F$ holds for the models analyzed in the previous iteration, the set of stack-heap models $\mathsf{SH}$ captures the residue of the initial heaps that are not modeled by $F$ , and the sequence $\mathsf{I}$ contains existential instantiations which map the existential variables in $F$ to concrete memory addresses.

In each iteration, given a stack variable $v\in\mathsf{V}$ and a tuple $(F,\mathsf{SH},\mathsf{I})\in\mathsf{R}$ , SLING derives a set of atomic heap predicates (i.e., inductive heap predicates, singleton heap predicates, or $\mathord{\mathsf{emp}}$ ), which models the sub-heaps in $\mathsf{SH}$ that contain memory cells reachable from $v$ . The heaps modeled by these predicates and $F$ are disjoint, thus we can strengthen $F$ with each predicate using the $*$ operator of SL. Intuitively, SLING splits the original stack-heap models into multiple sub-heaps, which are pointed-to by distinct (non-aliasing) stack variables. To model a sub-heap, SLING derives atomic formulae from the given predicates and the stack variables related to the sub-heap. Sections 4.1 and 4.2 describes the two sub-procedures $\mathsf{SplitHeap}$ and $\mathsf{InferAtom}$ , respectively.

In addition to finding invariants describing shape properties, SLING infers equality constraints over stack variables in $\mathsf{V}$ to represent pure properties. Section 4.3 describes the $\mathsf{InferPure}$ procedure that performs this step.

Validation

When we discover both pre and postconditions, we combine them to obtain program specifications. We also leverage the frame rule of SL to check if this combination is consistent with respect to the corresponding residual models. Thus, when inferring invariants at multiple return statements, SLING has an additional step that combines and validates formulae inferred at these locations. We describe this step in Section 4.4.

4.1. Heap Partitioning

Given a sequence $\mathsf{SH}$ of collected stack-heap models, SLING calls $\mathsf{SplitHeap}$ to splits the heap in each model $s_{i},h_{i}\in\mathsf{SH}$ into smaller sub-heaps so that each of them can be modeled by atomic heap predicates. Moreover, $\mathsf{SplitHeap}$ returns the common boundary of these sub-heaps, which consist of the $\mathord{\mathsf{nil}}$ and stack variables that are subsequently used to determine the arguments for these atomic heap predicates.

$\mathsf{SplitHeap}$ uses a depth-first search to traverse the pointer fields of memory cells from a $root$ pointer to partition the heap model $h_{i}$ into two non-overlapping parts: sub-heap $h^{\prime}_{i}$ and the remaining sub-heap $h^{\prime\prime}_{i}=h_{i}\setminus h^{\prime}_{i}$ . The sub-heap $h^{\prime}_{i}$ contains memory cells reachable from the $root$ pointer variable up to the $\mathord{\mathsf{nil}}$ pointer or memory cells pointed to by other stack pointer variables. We call these pointer variables or the $\mathord{\mathsf{nil}}$ pointer the boundary between the sub-heap $h^{\prime}_{i}$ and the other memory regions in the heap $h_{i}$ . The sub-heap $h^{\prime\prime}_{i}$ may contain memory cells unreachable from $root$ and those reachable from $root$ , but also pointed-to by other stack variables non-aliasing with $root$ .

For the concat example, Figure 3 illustrates that the boundaries of the sub-heaps $h^{\prime}_{1}$ , $h^{\prime}_{2}$ , and $h^{\prime}_{3}$ of the root variable x are $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ , $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ , and $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp},\texttt{y}\}$ , respectively. Their common boundary is $\{\texttt{x},\mathord{\texttt{res}},\mathord{\mathsf{nil}},\texttt{tmp}\}$ .

4.2. Inferring Atomic Heap Predicates

Given the sequence of sub-models $\mathsf{SH}_{root}$ of the $root$ pointer and its boundary $\mathsf{B}$ , the function $\mathsf{InferAtom}$ shown in Algorithm 2 computes a set of atomic predicates satisfied by all sub-models in $\mathsf{SH}_{root}$ . These atomic (shape) predicates consist of either (i) inductive heap predicates whose definitions are given in the set $\mathsf{P}$ (lines 2–11), (ii) singleton predicates of the $root$ pointer when the heap size of all sub-models in $\mathsf{SH}_{root}$ is 1 (lines 12–13), or (iii) the $\mathord{\mathsf{emp}}$ predicate with $\mathsf{SH}_{root}$ as the residual models when it cannot derive any predicates in the two former forms (line 14).

Inductive Heap Predicates

SLING discovers instances of each predefined predicate ${\mathord{\mathsf{p}}}{(t_{1},...,t_{n})}\in\mathsf{P}$ . For optimization, we filter the set $\mathsf{P}$ of predicate definitions to contain only those that have at least one parameter having the same type as the $root$ pointer. Also, for simplicity of presentation, we assume that the parameters $t_{1},...,t_{n}$ of $\mathord{\mathsf{p}}$ are pointer types.

SLING chooses potential arguments of predicate $\mathord{\mathsf{p}}$ from the common boundary $\mathsf{B}$ of all sub-heaps in the sub-models $\mathsf{SH}_{root}$ . It searches for these arguments from all permutation of $\mathsf{B}$ ’s subsets whose size is less than or equal to $n$ and contains $root$ (line 3). The inferred inductive predicates can contain as many stack variables as its arguments, thus we examine each subset in the ascending order of their size. Also, to reduce the search space, we only consider a permutation $(k_{1},...,k_{n})$ if it is type-consistent with the parameters $t_{1},...,t_{n}$ of the predicate $\mathord{\mathsf{p}}$ . That is, if $k_{i}$ is a stack pointer variable in $\mathsf{B}$ then its type must be a subtype of the corresponding parameter $t_{i}$ ’s type (line 8).

Next, we construct a formula $F$ from the inductive heap predicate ${\mathord{\mathsf{p}}}{(k_{1},...,k_{n})}$ (line 9). A formula $F$ is valid if it is successfully checked by all models in $\mathsf{SH}_{root}$ (line 10). This validity check also returns a residual heap $h^{\prime}_{i}$ and an existential instantiation $\iota_{i}$ for each stack-heap model $s_{i},h_{i}$ in $\mathsf{SH}_{root}$ . They are respectively the member of the sequence of residual models and the sequence of existential instantiations associating with the valid formula $F$ as an inference result in the set $\mathsf{R}$ (line 11).

In the concat example, when selecting the argument set $\{\texttt{x},\mathord{\texttt{res}},\texttt{tmp},\mathord{\mathsf{nil}}\}\in\mathsf{A}$ , the algorithm derives the formula $F_{x}=\exists u_{1}.\,{\mathsf{\mathsf{dll}}}{(u_{1},\mathord{\mathsf{nil}},\texttt{x},\texttt{tmp})}$ . This result shows that x is the last node of doubly linked lists whose head is $u_{1}$ and its next pointer is tmp. Moreover, $F_{x}$ models the whole sub-heaps of x in Figure 3, i.e., all residue models have empty heaps, when the existential variable $u_{1}$ is instantiated to the address 0x01. As another example, when considering another set of potential arguments $\{\texttt{x},\texttt{tmp}\}\in\mathsf{A}$ , we infer $F_{x}=\exists u_{1},u_{2}.\,{\mathsf{\mathsf{dll}}}{(\texttt{x},u_{1},u_{2},\texttt{tmp})}$ , indicating that x is the head of a doubly linked list segment to tmp.

Singleton Heap Predicates

We only derive singleton heap predicates of $root$ when there is a single memory cell in every $root$ ’s sub-model in $\mathsf{SH}_{root}$ (line 12). We consider a $\tau$ -typed singleton predicate template of the form ${root}{\overset{\mkern-2.0mu\tau}{\mapsto}}{k_{1},...,k_{n}}$ . The value of each field $k_{i}$ in the template is the common pointer variable (including $\mathord{\mathsf{nil}}$ ) pointing to the corresponding field of every memory cell in $\mathsf{SH}_{root}$ . If there is no such variable, we create a fresh existential variable for $k_{i}$ and update this variable’s instantiation to the value of the corresponding field in each model.

4.3. Pure Inference

The heap predicates derived in the previous steps mainly present the heap memory, but not the relations of variables within a predicate and among predicates in the overall results. In these results, the heap predicates are solely related via the common stack variables in their arguments.

We infer additional pure constraints over arguments of the predicates by searching for equality constraints over two different variables among stack variables, existential variable, $\mathord{\mathsf{nil}}$ , and the special variable $\mathord{\texttt{res}}$ if we are inferring post-conditions which are satisfied by every stack model and existential instantiation.

For example, we use this inference to obtain the relation $\mathord{\texttt{res}}\,{=}\,\texttt{x}$ about the return value of concat and other aliasing information, e.g., those shown in Section 2.3.

4.4. Validation

When obtaining multiple postconditions (e.g., at each return statements), we combine them with the inferred preconditions to derive program specifications. Next, we validate these specifications using frame rule of separation logic, i.e.,

[TABLE]

This rule says that if a triple ${\{P\}\mathsf{C}\{Q\}}$ is valid (i.e., $\mathsf{C}$ executes safely in the precondition $P$ and its post-states satisfying the postcondition $Q$ ) then the triple ${\{R*P\}\mathsf{C}\{R*Q\}}$ , in which $R$ is a frame modeling memory regions that are not manipulated by $\mathsf{C}$ , also holds.

For example, if $\mathsf{C}$ is a function, then $P$ and $Q$ are the pre and postconditions computed from the stack-heap models observed at the entry and exit of $\mathsf{C}$ , respectively. As another example, if $\mathsf{C}$ is a loop body, then $P$ and $Q$ , inferred from the models collected at the loop’s head, are identical and considered as a loop invariant.

According to the frame rule, if the inferred conditions $P$ and $Q$ are valid, then we can expand the corresponding memory regions modeled by $P$ and $Q$ by the same memory regions non-overlapping with them to obtain the whole memories observed at the entry and exit $\mathsf{C}$ . Otherwise, the inferred pair $P,Q$ is considered invalid (spurious). Therefore, we can check that the residual models corresponding to $P$ and $Q$ are unchanged with respect to observed models to determine the validity of this result.

In concat, we obtain the precondition $F^{\prime}_{L1}$ and two postconditions $F^{\prime}_{L2}$ (when $\texttt{x}\,{=}\,\mathord{\mathsf{nil}}$ ) and $F^{\prime}_{L3}$ (when $\texttt{x}\,{\neq}\,\mathord{\mathsf{nil}}$ ). For each pair of a model collected at $L1$ and the corresponding model collected at $L2$ or $L3$ , we check if the residual heap in the model at $L1$ , which is not captured by $F^{\prime}_{L1}$ , is the same as the residual heap in model at $L2$ or $L3$ , which is not captured by $F^{\prime}_{L2}$ or $F^{\prime}_{L3}$ , respectively. For example, given the model at $L1$ in Figure 2a and its corresponding model $t_{1}$ at $L3$ in Figure 2b, the residual heaps corresponding to $F^{\prime}_{L1}$ and $F^{\prime}_{L3}$ are both empty. On the other hand, in the last iteration of concat (when $\texttt{x}\,{=}\,\mathord{\mathsf{nil}}$ ), the residual heaps of $F^{\prime}_{L1}$ and $F^{\prime}_{L2}$ both contain three memory cells 0x01, 0x02, and 0x03.

4.5. Complexity

SLING is exponential in the number of predicates and their parameters. In addition, the complexity of the general heap model checking problem is EXPTIME (Brotherston et al., 2016). Thus, checking predicates over combinations of variables over many collected stack-heap models can be slow. To improve performance, SLING uses a type-checker (Algorithm 2, line 8) to eliminate variable combinations having inconsistent types to observed traces. The experiments in Section 5 also show that only a few traces are needed to discover accurate invariants and the Z3-based model checker performs efficiently over these traces.

5. Evaluation

SLING is implemented in Python and uses the LLDB debugger (LLDB, 2018) to collect traces at target program locations. Below we evaluate SLING on C programs, but SLING also works with programs written in other languages supported by LLDB (e.g., C++ and Objective-C) or having debuggers capable of capturing memory information (e.g., JDB for Java (JDB, 2018), PDB for Python (PDB, 2018), and GDB (GDB, 2018)).

Our experiments described below are conducted on a MacBook with 2.2GHz Intel CPU, 16 GB memory, and runs Mac OS. The source code of SLING and experimental data are available at https://github.com/guolong-zheng/sling/.

5.1. Benchmark Programs

We evaluate SLING using the VCDryad benchmark (VCDryad, 2018) consisting of 153 C heap-manipulating programs collected from various verification works, e.g., SV-COMP (Beyer, 2017), GRASShopper (Piskac et al., 2014) and AFWP (Itzhaky et al., 2013). These programs range from those that manipulate standard data structures (e.g., heaps and trees) to functions from popular open source libraries (e.g., Glib, OpenBSD) and the Linux kernel that manipulate customized data structures. Some of these programs have nontrivial bugs (e.g., causing segmentation faults) intended to test static analyzers. We also use 4 programs444This benchmark has 6 programs, we use 4 of them and exclude the other 2 because they use concurrency which SLING currently does not support. from (Brotherston et al., 2016). These programs implement non-trivial algorithms using multiple data structures (e.g., the Schorr-Waite graph marking algorithm using binary trees).

In total, these benchmark programs contain a wide variety of structures including singly-linked lists, doubly-linked lists, sorted lists, circular lists, binary trees, AVL trees, red-black trees, heaps, queues, stacks, iterators, etc. Moreover, these programs contain documented invariants (e.g., pre and postconditions such as those given in Section 2.1), which we use to evaluate SLING’s inferred invariants.

Table 1 shows these 157 programs (the last row shows the 4 programs from (Brotherston et al., 2016)). Column Programs lists the programs, categorized by data structures that they use. Column LoC shows the total lines of code of these programs. For example, the first row lists 8 programs that use standard singly-link lists (SLL) and have in total 168 LoC. In total, we have 157 programs in 22 categories with 4649 lines of C code.

5.2. Setup

For each program, we obtain traces, i.e., stack-heap models, to infer invariants at program entrances for preconditions, at loop entrances for loop invariants, and at program exits for postconditions (we systematically obtain traces at each return statement in a program). We use LLDB to set breakpoints at these locations to collect traces.

To obtain traces, we run each program on empty and randomly generated data structure inputs of a fixed size of 10. For example, for the concat program in Figure 1 that takes as input 2 doubly-linked lists, we generate 3 inputs consisting of a nil list and two randomly generated doubly-linked lists $a,b$ of size 10. Then we run concat over all input combinations, e.g., (nil,a), (nil,b), (a,b), …. Although these inputs are random and relatively small (size 10), the benchmark programs often modify and loop over data (e.g., as in concat), allowing us to generate sufficient and diverse traces.

For each category shown in Table 1, we adopt the predicate definitions given for that data from the benchmark programs, e.g., all programs DLL use the $\mathsf{\mathsf{dll}}$ inductive predicate shown in Section 2. The shape and complexity of these predicates vary, e.g., $\mathsf{\mathsf{dll}}$ has 4 parameters, 1 singleton predicate, and 1 inductive predicate, and the treeSeg predicate has 2 parameters, 2 singletons, and 4 inductive predicates.

Programs in several categories such as SV-COMP and Cyclist use complex nested data structures, which are data structures whose fields are other data structures. The predicates for these data structures involve multiple predicates that are quite complex, e.g., the iter predicate has 10 parameters, 5 singleton and 6 inductive predicates.

5.3. Results

Table 1 shows our results. Columns iLocs, Traces, and Invs lists, for programs in each category, the total number of target locations, obtained traces, and generated invariants, respectively. Column Invs also lists the number of spurious invariants in parentheses (rows with no such parentheses have no spurious invariants). Finally, column Time lists the total analysis time in seconds (including program execution, trace collection, and invariant inference).

For several programs, we were not able to obtain traces at considered locations using random inputs and thus could not infer invariants at those locations. Column A/S/X shows the number of programs where we obtained traces at all considered locations (A), obtained traces for some locations or inferred spurious results (S), and could not obtain traces or invariants at some locations (X). For example, for the 5 programs using binary search trees, we obtained traces at all considered locations in 2 programs, obtained traces at some locations in 2 programs, and could not obtain any traces in one program (quicksort).

The last three columns in the table give additional details about the generated invariants. Columns Single, Pred, and Pure list the average numbers of singleton predicates (e.g., ${x}{\overset{\mkern-2.0mu\texttt{node}}{\mapsto}}{nx,pr}$ ), inductive predicates (e.g., $\mathsf{\mathsf{dll}}$ ), and pure equalities (e.g., $x=\mathord{\texttt{res}}$ ) found in the invariants, respectively.

In total, SLING generated 3214 invariants in 487 target locations (average 6.60 invariants per location). These invariants consists of 309 preconditions, 2442 postconditions and 463 loop invariants. The total run time of SLING is 3866.06s for 149 programs555We exclude the 5 buggy programs that produce no traces and 3 programs that cause Z3 to time out. (average 25.95s per program and 1.2s per invariant). The time to run the program and collecting traces is negligible (about a second for all programs).

Out of 157 programs, we were not able to obtain any traces for 5 programs (marked with ∗ in the table). These programs contain bugs that immediately result in runtime errors such as segmentation faults (thus we obtained no traces and inferred no invariants). For 15 programs (italic text), we could not reach certain return branches using random inputs and thus were not able to obtain traces or infer invariants at those locations. For 3 programs (marked with $\dagger$ ), we were able to generate pre and postconditions, but not loop invariants. For these programs we hit loops more frequently than program entrance/exit points and thus obtained many traces for loops. Checking generated formulae over many traces is expensive (Section 4.5) and appears to cause Z3 to stop responding. Finally, for 17 programs (bold text), we obtained invalid traces and therefore generated spurious invariants. This is an interesting behavior of running C programs and the LLDB debugger: a free(x) statement does not immediately free the pointer x so LLDB still observes (now invalid) heap values of x in the execution traces. Thus we conservatively consider all generated invariants depending on these traces spurious and report them in Table 1.

For other programs (and those where we only obtain traces at certain locations), we manually analyzed and compared SLING’s generated invariants to documented ones. First, we found that all generated invariants are correct, i.e., they are true invariants at the considered locations. Thus, the spurious results reported in Table 1 are only those caused by invalid traces as described above. Second, our results either matched (syntactically or semantically equivalent) or, in many cases, were stronger than the documented invariants. For example, for SLL/reverse, we inferred the documented postcondition ${\texttt{sll}}{(\mathord{\texttt{res}})}$ and the additional constraints $x\,{=}\,\mathord{\mathsf{nil}}\wedge x\,{=}\,\textsf{tmp}$ showing that the header of the input list $x$ becomes the tail of the resulting list. In many similar cases, we achieved stronger results by inferring both the expected invariants and additional equalities.

A potential reason for these sound results is because SLING only infers shape properties using inductive predicates and pure equalities. These properties have strict patterns and thus a property that holds for the observed traces will likely hold for others. We also do not consider general disjunctive invariants or numerical relations (e.g., only check equivalences among memory addresses and do not consider other relationships such as the address of x is greater than that of y). Existing numerical invariant studies (Nguyen et al., 2014a, b) have shown that dynamic analysis often produces many spurious invariants involving disjunctions and general inequalities.

Although we tried our best to carefully check all generated results, the process of checking many complex SL invariants manually is time-consuming and difficult. In future work, we will use an automatic verifier that supports SL formulae to check SLING’s invariants (see additional details in Section 6). Moreover, we might be able to leverage test-input generation techniques, e.g., symbolic execution with lazy-initialization (Khurshid et al., 2003) or SL predicates (JSF, 2018), to construct smart inputs, which can explore hard-to-reach program paths to infer better invariants.

5.4. Uses of Inferred Invariants

Dynamically inferred invariants can help users understand programs (e.g., discovering loop invariants, pre and postconditions for unknown programs) and gain confidence about expected properties (e.g., the generated invariants met the expectation). They can also be used to catch regression bugs: the user instruments these invariants as assertions in code to detect changes that break these assertions when the program run666The work in (Nguyen et al., 2008) shows how to encode SL formulae, which contain non-standard operations such as ∗, to executable functions that can be used assertions in code to enable run-time checking.. Existing works also list many other uses of dynamic invariants including documentation, complexity analysis, fault localization, and bug repair (Ernst et al., 2007; B Le et al., 2016; Perkins et al., 2009; Nguyen et al., 2017). Below we show two concrete uses of SLING’s SL invariants.

Explaining Bugs

Although SLING cannot generate inputs to reach a buggy location, it can, when given such inputs, discover useful invariants to alert and help the developer analyze that bug. For Red-black Tree/insert, we obtained an invariant that appears too “simple”. Manual inspection showed that the inferred invariant is indeed correct: the program always crashes after the first iteration, thus the inferred invariant only captures a portion of data operated during the first iteration. For glib/glist_SLL/sortMerge, SLING reported an unexpected postcondition stating that the result is always null. Manual inspection revealed this is correct and is due to a (typo) bug in the program that returns list_next instead of list->next. For AFWP/dll_fix.c, the expected loop invariant is $\exists u_{1},u_{2},u_{3},u_{4}.\,{\texttt{sll}}{(i)}*{\mathsf{\mathsf{dll}}}{(j,u_{1},k,u_{2})}*{\mathsf{\mathsf{dll}}}{(k,u_{3},u_{4},\mathord{\mathsf{nil}})}$ , but SLING returned ${\texttt{sll}}{(i)}\wedge i\,{=}\,h\wedge k\,{=}\,j\wedge k\,{=}\,\mathord{\mathsf{nil}}$ . Thus, the expected invariant shows that $k$ can be non- $\mathord{\mathsf{nil}}$ , but SLING’s invariant shows the opposite. Manual inspection showed a (potentially seeded) bug, where a guard checking for $k\,{=}\,\mathord{\mathsf{nil}}$ was commented out. Indeed, with this guard uncommented, SLING inferred the expected invariant.

Identifying Spurious Warnings

SLING’s invariants can help check results from static analyzers, e.g., to understand and gain confidence about reported results or detect potential problems. The FBInfer tool mentioned in Section 1 is a well-known SL static analyzer that produces warnings for memory safety bugs for iOS and Android apps. However, FBInfer can produce spurious (false positive) warnings. For example, when analyzing the correct version of the mentioned glib/glist_SLL/sortMerge program, FBInfer reported a memory leak after the assignment l->next = NULL; at the end of a loop because it thinks that l->next is not reachable. However, SLING’s inferred invariants at that location showed l->next is a valid alias to other pointer variables and reachable. Manual inspection confirmed that SLING’s generated invariants are correct and the program has no memory leak at that location. We applied the same technique and found similar spurious warnings from FBInfer for 7 other programs777Programs with spurious warnings: merge in Binomial_Heap, delBack in Circular_List, copy in Grasshopper_SLL(Rec), insert in GRASShopper_DLL, GRASShoper_SLL(Iter), GRASShoper_SortedList, and Grasshopper_SLL(Rec)..

Note that FBInfer also reported an error at another location of sortMerge. However, this time, SLING’s invariants confirmed the warning and even revealed that the error is caused by a dangling pointer.

5.5. Comparing to the S2 Static Analyzer

We compare SLING to the static tool S2 (Le et al., 2014), which uses the state-of-the-art bi-abduction technique (Calcagno et al., 2011) in SL to generate invariants proving memory safety of C programs, e.g., no null pointer dereferencing and memory leaks. In addition to memory checking, S2 attempts to find strongest specifications consisting of pre and postconditions for heap programs.

We compare S2 to SLING using the same C benchmark programs888Several of these programs, e.g., those in SLL, DLL, and Binary trees, are also used in (Le et al., 2014) to evaluate S2’s capability of proving memory safety. listed in Table 1. S2 only supports shape invariants, thus we only compare shape invariants generated by the two tools and ignore the pure invariants generated by SLING. Moreover, S2 does not infer invariants at arbitrary locations like SLING, instead it attempts to find complete specifications (involving both pre and postconditions) and loop invariants. Thus, we do not consider invariants generated at individual locations as shown in Table 1 and instead consider specifications as a whole and loop invariants. Note that each program has a specification but only programs with loops have loop invariants. As with SLING, we manual analyze the results of S2 and compare them to the documented invariants999We use these documented, verified invariants as “ground truths” and check if S2’s results can match them. It is possible that these invariants are incomplete and not the “best” and thus non-matching results from S2 do not necessarily mean they are worse than these invariants..

Table 2 shows the comparison results. Column Programs lists the program categories, similarly to those listed in Table 1. Column Total lists the number of documented properties consisting of specifications and loop invariants for the programs in the corresponding category. The next four columns list the respective numbers of properties that Both tools can generate, S2 can generate but SLING cannot, SLING can generate but S2 cannot, and Neither tools can generate. For example, the 10 programs in Sorted List have 14 properties (10 specifications and 4 loop invariants), from which there are 6 properties that both tool found, none that only S2 found, 6 that only SLING found, and 2 that both fail to find. Finally, S2 takes less than a second for all but the 4 concat programs in the GRASShopper categories, from which S2 appears to stop responding.

The last row of Table 2 summarizes the results. First, both tools discovered and failed about 14% and 15% of the properties, respectively. Properties found by both tools are from simple recursive programs with singly-linked lists and trees. For programs containing properties that neither tool found, we observed no patterns and different failure reasons, e.g., for SLL/quicksort, S2 did not produce any specification while SLING inferred no properties due the program crashed and produced no traces. Next, S2 found 5 properties that SLING did not. These properties are mostly in programs where SLING obtained incomplete or spurious results due to lack of traces, e.g., binary_search_tree/find_rec.c, or the “free” problem, e.g., GRASShopper/rec/dispose.c (details in Section 5.3). Finally, SLING found many properties that S2 did not (162/237). For these properties, which often come from complex programs with rich data structures, S2 either completely failed to produce them or produced much weaker than expected results.

In summary, SLING found more documented invariants comparing to S2. We find this result encouraging as it shows the competitiveness of SLING to static analyzers.

6. Related Work

SLING is inspired by the well-known dynamic invariant tool Daikon (Ernst et al., 2007; Ernst et al., 2001). Daikon comes with a large list of invariant templates and predicates, tests them against program traces, removes those that fail, and reports the remains as candidate invariants. Recently, several techniques (such as PIE (Padhi et al., 2016), ICE (Garg et al., 2016), DIG (Nguyen et al., 2014a), and SymInfer (Nguyen et al., 2017)) have been developed to infer numerical invariants using a hybrid approach that dynamically infers candidate invariants and then statically checks them against the program code. These approaches do not consider SL invariants for memory shape analysis.

Static program analysis in SL has rapidly gained adoption from both academia and industry in the past decade. MemCAD (Illous et al., 2017) and THOR (Magill et al., 2010, 2008) reason about shape and numerical properties of programs, but generate invariants for a restricted language of list manipulating programs (Magill et al., 2006). FBInfer (Calcagno et al., 2015; FBInfer, 2018) uses bi-abduction to generate invariants to detect real memory bugs, but only supports simple structures (e.g., linked lists) and restricted language features (e.g., no arithmetic). CABER (Brotherston and Gorogiannis, 2014) and S2 (described in Section 5.5) also use bi-abduction to offer more general, but more expensive, approaches to infer shape properties. These tools do not consider invariants at arbitrary locations: CABER only analyzes preconditions and S2 only infers pre and postconditions.

The data-driven tool DOrder (Zhu et al., 2016) generates specifications for data structures in OCaml. Given the definition of a data structure, DOrder generates predicates capturing shape and ordering relations among data (e.g., element $x$ is reachable from element $y$ or the value $x$ appears in the left subtree of a node containing the value $y$ ), learns specifications from predicates and in/output data, and verifies specifications using a refinement type system. SLING makes an orthogonal contribution in finding a different form of shape properties to express sharing and aliasing information, e.g., nodes $x$ and $y$ point to (sub)trees in separate heaps.

The tool Locust (Brockschmidt et al., 2017) hybridizes dynamic and static analyses to infer SL invariants for programs written in a restricted language. To infer an invariant, Locust expands the syntax of an SL formula using a machine learning model trained from a large set of data. Locust iteratively refines inferred invariants using counterexamples obtained by the Grasshopper static verifier (Piskac et al., 2014). Locust is mainly evaluated on example programs with singly-linked lists and binary trees and does not support more complex data structures (e.g., Locust does not support doubly-linked list and returns no results when applied to our concat example). The tool also relies on an expensive training process over large data sets.

Finally, automatic verification tools such as HIP (Chin et al., 2012), Grasshopper (Piskac et al., 2014), Verifast (Jacobs et al., 2011), and VCDryad (Pek et al., 2014) can prove given SL specifications and invariants in heap-based programs. In future work, we intend to use these tools to automatically check SLING’s inferred invariants.

7. Conclusion

We introduce a new dynamic analysis to infer SL invariants for heap-manipulating programs. The approach is based on the insight that the heap at a program location can be partitioned into disjoint regions reachable from various stack variables and that these regions can be modeled by atomic SL formulae. Moreover, these formulae can be dynamically inferred and then combined using separating conjunction.

We present SLING, a tool that implements these ideas to generate SL invariants at arbitrary program locations. SLING has several technical details including finding and using boundary variables instantiate predicates, using an SL model checker to compute both an instantiation for existential variables and the residual heap, and using the frame rule to validate inferred specifications.

Preliminary results on a large set of nontrivial programs show that SLING is effective in discovering useful invariants describing operations over a wide variety of data structures. We believe that SLING takes an important step in broadening the space of properties about heap programs that can be dynamically inferred and exposes opportunities for researchers to exploit new dynamic SL invariant analyses.

Acknowledgments

We thank our shepherd Nadia Polikarpova and the anonymous reviewers for their feedback. The first author was supported by the Office of Naval Research award N000141712787.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2B Le et al . (2016) Tien-Duy B Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A learning-to-rank based fault localization approach using likely invariants. In ISSTA . ACM, 177–188.
3Ball and Rajamani (2002) Thomas Ball and Sriram K. Rajamani. 2002. The SLAM Project: Debugging System Software via Static Analysis. In POPL . 1–3.
4Berdine et al . (2004) Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. 2004. A Decidable Fragment of Separation Logic. In Foundations of Software Technology and Theoretical Computer Science . 97–109.
5Beyer (2017) Dirk Beyer. 2017. Software Verification with Validation of Results. In TACAS . 331–349.
6Brockschmidt et al . (2017) Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. 2017. Learning Shape Analysis. In Static Analysis Symposium . 66–87.
7Brotherston and Gorogiannis (2014) James Brotherston and Nikos Gorogiannis. 2014. Cyclic Abduction of Inductively Defined Safety and Termination Preconditions. In Static Analysis . 68–84.
8Brotherston et al . (2016) James Brotherston, Nikos Gorogiannis, Max I. Kanovich, and Reuben Rowe. 2016. Model checking for symbolic-heap separation logic with inductive predicates. In POPL . 84–96.