SyGuS-Comp 2018: Results and Analysis

Rajeev Alur; Dana Fisman; Saswat Padhi; Rishabh Singh and; Abhishek Udupa

arXiv:1904.07146·cs.PL·April 16, 2019

SyGuS-Comp 2018: Results and Analysis

Rajeev Alur, Dana Fisman, Saswat Padhi, Rishabh Singh and, Abhishek Udupa

PDF

TL;DR

The paper reports on the 2018 Syntax-Guided Synthesis competition, analyzing the performance of various solvers on over 1600 benchmarks to advance research in efficient synthesis techniques.

Contribution

It provides a comprehensive analysis of solver performances in the 2018 SyGuS-Comp, highlighting progress and challenges in syntax-guided synthesis research.

Findings

01

Multiple solvers demonstrated varying success rates across benchmarks.

02

The competition identified key areas for improvement in synthesis algorithms.

03

Benchmark diversity helped evaluate solver robustness.

Abstract

Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation $f$ that meets both a semantic constraint given by a logical formula $ϕ$ in a background theory $T$ , and a syntactic constraint given by a grammar $G$ , which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF), a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In the 5th SyGuS-Comp, five solvers competed on over 1600 benchmarks across various tracks. This paper presents and analyses the results of this year's (2018) SyGuS…

Figures34

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1: New benchmarks contributed to various tracks

Track	Benchmarks	Contributors
CLIA	15	Kangjing Huang (Purdue University)
General	29	Qinheping Hu and Loris D’ Antoni (University of Wisconsin-Madison)
Invariants	21 + 32	Saswat Padhi (UCLA) + Kangjing Huang (Purdue University)
PBE-Strings	10	Woosuk Lee (University of Pennsylvania)

Table 2. Table 2: List of registered solvers

Solver	Authors
${CVC4}_{_} 2018$	Andrew Reynolds (Univ. of Iowa), Haniel Barbosa (Univ. of Iowa), Andrez Notzli (Stanford),
	Cesare Tinelli (Univ. of Iowa), and Clark Barrett (Stanford)
DryadSynth	KangJing Huang (Purdue Univ.), Xiaokang Qiu (Purdue Univ.), Qi Tan (Nanjing Univ.), and
	Yanjun Wang (Purdue Univ.)
${EUSolver}_{_} 2017$	Arjun Radhakrishna (Microsoft) and Abhishek Udupa (Microsoft)
Horndini	Deepak D’Souza (IISc, Bangalore), P. Ezudheen (IISc, Bangalore), P. Madhusudan (UIUC),
	Pranav Garg (Amazon), Daniel Neider (MPI-SWS), and Shubham Ugare (IIT, Guwahati)
LoopInvGen	Saswat Padhi (UCLA), Rahul Sharma (Microsoft Research), and Todd Millstein (UCLA)

Table 3. Table 3 : Participating solvers

	Solver
Track	${CVC4}_{_} 2018$	DryadSynth	${EUSolver}_{_} 2017$	Horndini	LoopInvGen
CLIA	1	1	1	0	0
INV	1	1	1	1	1
General	1	0	1	0	0
PBE-Strings	1	0	1	0	0
PBE-BV	1	0	1	0	0

Table 4. Table 4 : The performance of various solvers across all categories of the general track

		Compiler Optimizations and Bit Vectors	Let and Motion Planning	Invariant Generation with Bounded Ints	Invariant Generation with Unbounded Ints	Multiple Functions	Arrays	Hackers Delight	Integers	Program Repair	ICFP	Cryptographic Circuits	Instruction Selection	Total
Number of benchmarks		32	30	28	28	32	35	69	34	18	50	214	28	598
Solved	${CVC4}_{_} 2018$	16	17	24	24	13	31	62	34	17	50	160	0	448
	${EUSolver}_{_} 2017$	16	10	24	23	18	31	53	33	14	50	148	0	420
	${CVC4}_{_} 2017$	15	15	24	24	12	31	62	34	17	48	116	0	398
Fastest	${CVC4}_{_} 2018$	15	15	22	24	9	31	59	33	16	23	119	0	366
	EUSolver	13	1	12	11	14	5	29	15	12	45	109	0	266
	${CVC4}_{_} 2017$	12	9	16	14	9	24	60	33	6	20	49	0	252
Uniquely	${CVC4}_{_} 2018$	1	2	0	0	0	0	0	0	2	0	7	0	12
	EUSolver	3	0	0	0	6	0	0	0	0	0	0	0	9
	${CVC4}_{_} 2017$	0	0	0	0	0	0	0	0	0	0	0	0	0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

SyGuS-Comp 2018: Results and Analysis

Rajeev Alur University of Pennsylvania, USA [email protected] Ben-Gurion University, IsraelUniversity of California,

Los Angeles, USAGoogle Brain, USAMicrosoft, USA

Dana Fisman Ben-Gurion University, Israel [email protected] University of California,

Los Angeles, USAGoogle Brain, USAMicrosoft, USA

Saswat Padhi University of California,

Los Angeles, USA [email protected] Google Brain, USAMicrosoft, USA

Rishabh Singh Google Brain, USA [email protected] Microsoft, USA

Abhishek Udupa Microsoft, USA [email protected]

Abstract

Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation $f$ that meets both a semantic constraint given by a logical formula $\varphi$ in a background theory $T$ , and a syntactic constraint given by a grammar $G$ , which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF), a language that is built on top of SMT-LIB.

The syntax-guided synthesis competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In the 5th SyGuS-Comp, five solvers competed on over 1600 benchmarks across various tracks. This paper presents and analyses the results of this year’s (2018) SyGuS competition.

1 Introduction

The Syntax-Guided Synthesis competition (SyGuS-Comp) is an annual competition aimed to provide an objective platform for comparing different approaches for solving the syntax-guided synthesis problem. A SyGuS problem takes as input a logical specification $\varphi$ for what a synthesized function $f$ should compute, and a grammar $G$ providing syntactic restrictions on the implementation for the function $f$ to be synthesized. Formally, a solution to a SyGuS instance $(\varphi,G,f)$ is a function $f_{\_}{imp}$ that is expressible in the grammar $G$ such that the formula $\varphi[f/f_{\_}{imp}]$ obtained on replacing $f$ by $f_{\_}{imp}$ in the logical specification $\varphi$ is valid. SyGuS instances are formulated in SyGuS-IF [10], a format built on top of SMT-LIB2 [5].

We report here on the 5th SyGuS competition that took place in July 2018, in Oxford, UK as a satellite event of CAV’18 (The 30th International Conference on Computer-Aided Verification) and SYNT’18 (The 7th Workshop on Synthesis). As in the previous year’s competition, there were four tracks: the general track, the conditional linear integer arithmetic (CLIA) track, the invariant synthesis (Inv) track, and the programming by examples (PBE) track. We assume that most readers of this report would already be familiar with the SyGuS problem and the SyGuS-Comp tracks, and thus refer unfamiliar readers to the report on last year’s competition [3].

The report is organized as follows.

•

Section 2 briefly describes the benchmarks and the key idea behind the submitted solvers.

•

Section 3 provides details on the experimental setup.

•

Section 4 gives an overview of the results per track.

•

Section 5 provides details on the results, given from a single benchmark perspective.

•

Section 6 concludes the report with some key takeaway points.

2 Submitted Benchmarks and Solvers

In addition to the benchmarks from the last year’s competition, we received over $100$ new benchmarks this year, across various competition tracks, which we summarize below in Table 1.

Five solvers were submitted to this year’s competition:

(1) $\textsc{CVC4}_{\_}{2018}$ , an improved version of CVC4,

(2) DryadSynth, a solver specialized for conditional linear integer arithmetic,

(3) $\textsc{EUSolver}_{\_}{2017}$ , an improved version of EUSolver,

(4) Horndini, a solver specialized for constrained horn clauses (CHCs) and

(5) LoopInvGen, a solver specialized for invariant generation problems.

Table 2 lists the submitted solvers along with their authors, and Table 3 shows the tracks in which each solver participated.

The $\textsc{CVC4}_{\_}{2018}$ solver is based on an approach for program synthesis that is implemented inside an SMT solver [11]. This approach extracts solution functions from unsatisfiability proofs of the negated form of synthesis conjectures, and uses counterexample-guided techniques for quantifier instantiation (CEGQI) that make finding such proofs practically feasible. $\textsc{CVC4}_{\_}{2018}$ also combines enumerative techniques, and symmetry breaking techniques [12].

The DryadSynth solver combines enumerative and symbolic techniques. It considers benchmarks in conditional linear integer arithmetic theory (LIA), and can therefore assume all have a solution in some pre-defined decision tree normal form. It then tries to first get the correct height of a normal form decision tree, and then tries to synthesize a solution of that height. It makes use of parallelization, using as many cores as are available, and of optimizations based on solutions of typical LIA SyGuS problems.

The $\textsc{EUSolver}_{\_}{2017}$ solver uses a divide-and-conquer strategy [2] to find different expressions that satisfy different subsets of the input space, and then unifies them into a solution that works well for the entire space of inputs. Subexpressions are typically found using enumeration techniques and are then unified into the final solutions using decision tree learning [4].

The Horndini solver extends the classical IC3 decision-tree algorithm with the Horn implication counterexamples (Horn-ICE) framework [6], which extends the ICE-learning model. The authors describe a decision-tree learning algorithm that learns from Horn-ICE samples, works in polynomial time, and uses statistical heuristics to learn small trees that satisfy the samples.

The LoopInvGen solver [9] for invariant synthesis extends the data-driven approach to inferring sufficient loop invariants from a set of program states [8]. Previous approaches to invariant synthesis were restricted to using a fixed set, or a fixed template for features, e.g., ICE-DT [7] requires the shape of constraints (such as octagonal) to be fixed apriori. Instead LoopInvGen starts with no initial features, and automatically learns features as necessary using program synthesis. It reduces the problem of loop invariant inference to a series of precondition inference problems, and uses a counterexample-guided inductive synthesis (CEGIS) loop to revise the current candidate.

3 Experimental Setup

The solvers were run on the StarExec platform [13] with a dedicated cluster of 12 nodes, where each node consisted of two 4-core 2.4 GHz Intel processors with 256 GB RAM and 1 TB hard-disk space. The memory usage limit for each solver run was set to 128 GB, and the wall-clock time limit is set to 3600 seconds (thus, a solver that used all 4 cores could consume at most 14400 seconds of CPU time). The solutions that the solvers produced were checked for both syntactic and semantic correctness. That is, a first postprocessor checked that the produced expression adhered to the grammar specified in the given benchmark, and if this check passes, a second postprocessor checked that the solution adhered to semantic constraints given in the benchmark (by invoking an SMT solver).

4 Results Overview

The primary criterion for winning a track was the number of benchmarks solved, but we also analyzed the time to solve and the the size of the generated expressions. The overall score for each solver was computed as $5N+3F+S$ . Here $N$ denotes the number of benchmarks solved by the solver, $F$ denotes the number of benchmarks solved among the fastest, and $S$ denotes the number of benchmarks for which the size of the generated solution was among the shortest. We used a pseudo-logarithmic scale for $F$ and $S$ . For time to solve, the scale is: $[0,1)$ , $[1,3)$ , $[3,10)$ , $[10,30)$ , $[30,100)$ , $[100,300)$ , $[300,1000)$ , $[1000,3600)$ , $\geqslant 3600$ . That is, the first “bucket” refers to termination in less than one second, the second to termination in one to three seconds and so on. We say that a solver solved a certain benchmark among the fastest if the time it took to solve that benchmark is in the same bucket as that of the solver which solved that benchmark in minimum time. Similarly, for expression sizes, the pseudo-logarithmic scale we use is: $[1,10)$ , $[10,30)$ , $[30,100)$ , $[100,300)$ , $[300,1000)$ , $\geqslant 1000$ , where expression size is the number of nodes in the SyGuS parse-tree. We also report on the number of benchmarks solved uniquely by a solver, i.e. the number of benchmarks which no solver other than the particular solver could solve.

In Figure 1, we show the number of benchmarks solved, the number of benchmarks solved among the fastest, and the number of synthesized expressions among the smallest size; per solver per track.

General Track

In the general track, $\textsc{CVC4}_{\_}{2018}$ solved the most number of benchmarks ( $448$ ), and $\textsc{EUSolver}_{\_}{2017}$ came second, solving $420$ . We note that the new version $\textsc{CVC4}_{\_}{2018}$ is significantly better than the previous version $\textsc{CVC4}_{\_}{2017}$ , which could only solve $398$ benchmarks. The same order appears in the number of benchmarks solved among the fastest: $\textsc{CVC4}_{\_}{2018}$ with $366$ , $\textsc{EUSolver}_{\_}{2017}$ with $266$ , and $\textsc{CVC4}_{\_}{2017}$ with $252$ . Finally, we note that $\textsc{CVC4}_{\_}{2018}$ is able to solve $12$ benchmarks that no other solver could solve, and similarly there are $9$ benchmarks that only EUSolver could solve.

We partitioned the benchmarks of the general track to a number of categories, each containing a set of related benchmarks. The results per category are given in the Table 4. We observe that $\textsc{EUSolver}_{\_}{2017}$ preformed significantly better in the “Multiple Functions” and “ICFP” categories. While the $\textsc{CVC4}_{\_}{2018}$ solver preformed better in the other categories, none of the solvers could solve any of the benchmarks from the “Instruction Selection” category.

Conditional Linear Arithmetic Track

In the CLIA track, $\textsc{CVC4}_{\_}{2018}$ and DryadSynth had a close competition. $\textsc{CVC4}_{\_}{2018}$ solved $85$ out of $88$ benchmarks, DryadSynth solved $84$ benchmarks, and $\textsc{EUSolver}_{\_}{2017}$ solved $81$ benchmarks. In terms of the time to solve, DryadSynth solved $79$ benchmarks among the fastest, $\textsc{CVC4}_{\_}{2018}$ solved $74$ , followed by $\textsc{EUSolver}_{\_}{2017}$ which solved $29$ among the fastest. There were two benchmarks that were solved uniquely by DryadSynth, and one that was solved uniquely by $\textsc{CVC4}_{\_}{2018}$ .

Invariant Generation Track

In the invariant generation track, the LoopInvGen solver solved $115$ out of $127$ benchmarks, $\textsc{CVC4}_{\_}{2018}$ solved $109$ , DryadSynth solved $103$ , Horndini solved $68$ and $\textsc{EUSolver}_{\_}{2017}$ solved $61$ benchmarks. In terms of the time to solve, LoopInvGen solved $102$ benchmarks among the fastest, followed by $\textsc{CVC4}_{\_}{2018}$ which solved $82$ , DryadSynth which solved $81$ , Horndini which solved $21$ , and $\textsc{EUSolver}_{\_}{2017}$ which solved $15$ . There was one benchmark that was solved by a unique solver – the fib17n.sl benchmark solved by LoopInvGen.

Programming By Example (Bit Vectors) Track

In the PBE track on the theory of bit vectors, the $\textsc{CVC4}_{\_}{2018}$ solver solved $724$ out of $750$ benchmarks and $\textsc{EUSolver}_{\_}{2017}$ solved $677$ benchmarks. In terms of the time to solve, $\textsc{CVC4}_{\_}{2018}$ solved $508$ benchmarks among the fastest, and $\textsc{EUSolver}_{\_}{2017}$ solved $468$ . However, $\textsc{EUSolver}_{\_}{2017}$ generates shorter expressions than $\textsc{CVC4}_{\_}{2018}$ in significantly many cases. There were four benchmarks that were solved uniquely by $\textsc{CVC4}_{\_}{2018}$ , and one benchmark that was solved uniquely by $\textsc{EUSolver}_{\_}{2017}$ .

Programming By Example (Strings) Track

In the PBE track on the theory of strings, the $\textsc{CVC4}_{\_}{2018}$ solver solved $102$ out of $118$ benchmarks, and $\textsc{EUSolver}_{\_}{2017}$ solved $79$ benchmarks. In terms of the time to solve, $\textsc{CVC4}_{\_}{2018}$ solved $98$ benchmarks among the fastest, and $\textsc{EUSolver}_{\_}{2017}$ solved $79$ . We note again that $\textsc{EUSolver}_{\_}{2017}$ generates shorter expressions than $\textsc{CVC4}_{\_}{2018}$ in several cases. There were $21$ benchmarks that were solved uniquely by $\textsc{CVC4}_{\_}{2018}$ .

5 Detailed Results

In this section we show the results of the competition from the benchmark’s perspective. For a given benchmark we would like to know:

(1) how many solvers solved it

(2) what are the minimum and maximum times required to solve

(3) what are the minimum and maximum sizes of solutions generated

(4) which solver solved the benchmark the fastest, and

(5) which solver produced the smallest expression.

We present the results in groups organized per track and category. For instance, the top plot in Figure 6 presents the details for program repair benchmarks from the general track. The black bars above the $y$ -axis show the range of time taken to solve across the various solvers, in our pseudo logarithmic scale. Inspect for instance benchmark t2.sl. The black bar indicates that the fastest solver takes less than $1$ second, and the slowest one takes between $100$ to $300$ seconds. The black number above the black bar indicates the exact number of seconds (floor-rounded to the nearest second) it took the slowest solver to solve a benchmark (and $\infty$ if at least one solver exceeded the time bound). Thus, we can see that for t2.sl, the slowest solver took $138$ seconds. The white number at the lower part of the bar indicates the time taken by the fastest solver. Thus, we can see that for t2.sl, the fastest solver required less than $1$ second. The colored squares/rectangles below the black bar indicate which solvers were among the fastest to solve that benchmark (according to the solvers’ legend at the top). For instance, we can see that $\textsc{CVC4}_{\_}{2018}$ and $\textsc{EUSolver}_{\_}{2017}$ were the fastest to solve t2.sl, solving it in less than a second, and that among the solvers that solved t4.sl only $\textsc{EUSolver}_{\_}{2017}$ solved it in less than a second.

Similarly, the gray bars below the $y$ -axis indicate the range of expression sizes in pseudo-logarithmic scales, where the size of an expression is determined by the number of nodes in its parse tree. The black number at the bottom of the gray bar indicates the exact size of the largest solution (or $\infty$ if it exceeded $1000$ ), and the white number at the top of the gray bar indicates the exact size of the smallest solution. When the smallest and largest size of expressions are in the same pseudo-logarithmic bucket, as is the case in t2.sl), we indicate the expression size only in black. The colored squares/rectangles above the gray bar indicate which solvers were amongst the ones that produced the smallest expression (according to the solvers’ legend at the top). For instance, for t20.sl the smallest expression produced had size $3$ , which is produced only by $\textsc{EUSolver}_{\_}{2017}$ .

Finally, the top $x$ -axis indicates the number of solvers that solved a particular benchmark. For instance, in Figure 6, only one solver solved t6.sl, two solvers solved t14.sl, three solvers solved t2.sl, and no solver solved t13.sl. Note that for the benchmarks that no solver is able to solve, the black bars indicate the range of time taken by solvers to terminate. When no solver produces a correct result, there are no colored squares/rectangles below the black bars, as is the case for t13.sl.

6 Summary

This year’s competition consisted of over $1600$ benchmarks, $107$ of which where contributed this year. Five solvers competed this year, one of which was submitted by developers creating a tool for SyGuS-Comp for the first time. All tools preformed remarkably well, on both existing and new benchmarks. In particular, more than $74$ % of the current set of benchmarks from the general track are now solved. However, there are several classes of problems that are still challenging for the current solvers, especially the “Instruction Selection” and “Multiple Functions” categories; and we hope the developers would continue to improve their SyGuS techniques and advance the state of the art.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] Rajeev Alur, Pavol Cerný & Arjun Radhakrishna (2015): Synthesis Through Unification . In: Computer Aided Verification - 27th International Conference, CAV, Proceedings, Part II , pp. 163–179.
3[3] Rajeev Alur, Dana Fisman, Rishabh Singh & Armando Solar-Lezama (2017): Sy Gu S-Comp 2017: Results and Analysis . In: Proceedings of the Sixth Workshop on Synthesis, SYNT@CAV , EPTCS 260, pp. 97–115.
4[4] Rajeev Alur, Arjun Radhakrishna & Abhishek Udupa (2017): Scaling Enumerative Program Synthesis via Divide and Conquer . In: Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS, Proceedings, Part I , pp. 319–336.
5[5] Clark Barrett, Aaron Stump & Cesare Tinelli: The SMT-LIB Standard – Version 2.0 .
6[6] P. Ezudheen, Daniel Neider, Deepak D’Souza, Pranav Garg & P. Madhusudan (2018): Horn-ICE Learning for Synthesizing Invariants and Contracts . PACMPL 2(OOPSLA), pp. 131:1–131:25.
7[7] Pranav Garg, Daniel Neider, P. Madhusudan & Dan Roth (2016): Learning Invariants using Decision Trees and Implication Counterexamples . In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL , pp. 499–512.
8[8] Saswat Padhi, Rahul Sharma & Todd D. Millstein (2016): Data-driven precondition inference with learned features . In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI , pp. 42–56.