SyGuS-Comp 2018: Results and Analysis
Rajeev Alur, Dana Fisman, Saswat Padhi, Rishabh Singh and, Abhishek Udupa

TL;DR
The paper reports on the 2018 Syntax-Guided Synthesis competition, analyzing the performance of various solvers on over 1600 benchmarks to advance research in efficient synthesis techniques.
Contribution
It provides a comprehensive analysis of solver performances in the 2018 SyGuS-Comp, highlighting progress and challenges in syntax-guided synthesis research.
Findings
Multiple solvers demonstrated varying success rates across benchmarks.
The competition identified key areas for improvement in synthesis algorithms.
Benchmark diversity helped evaluate solver robustness.
Abstract
Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation that meets both a semantic constraint given by a logical formula in a background theory , and a syntactic constraint given by a grammar , which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF), a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In the 5th SyGuS-Comp, five solvers competed on over 1600 benchmarks across various tracks. This paper presents and analyses the results of this year's (2018) SyGuS…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34| Track | Benchmarks | Contributors |
|---|---|---|
| CLIA | 15 | Kangjing Huang (Purdue University) |
| General | 29 | Qinheping Hu and Loris D’ Antoni (University of Wisconsin-Madison) |
| Invariants | 21 + 32 | Saswat Padhi (UCLA) + Kangjing Huang (Purdue University) |
| PBE-Strings | 10 | Woosuk Lee (University of Pennsylvania) |
| Solver | Authors |
|---|---|
| Andrew Reynolds (Univ. of Iowa), Haniel Barbosa (Univ. of Iowa), Andrez Notzli (Stanford), | |
| Cesare Tinelli (Univ. of Iowa), and Clark Barrett (Stanford) | |
| DryadSynth | KangJing Huang (Purdue Univ.), Xiaokang Qiu (Purdue Univ.), Qi Tan (Nanjing Univ.), and |
| Yanjun Wang (Purdue Univ.) | |
| Arjun Radhakrishna (Microsoft) and Abhishek Udupa (Microsoft) | |
| Horndini | Deepak D’Souza (IISc, Bangalore), P. Ezudheen (IISc, Bangalore), P. Madhusudan (UIUC), |
| Pranav Garg (Amazon), Daniel Neider (MPI-SWS), and Shubham Ugare (IIT, Guwahati) | |
| LoopInvGen | Saswat Padhi (UCLA), Rahul Sharma (Microsoft Research), and Todd Millstein (UCLA) |
| Solver | |||||
| Track | DryadSynth | Horndini | LoopInvGen | ||
| CLIA | 1 | 1 | 1 | 0 | 0 |
| INV | 1 | 1 | 1 | 1 | 1 |
| General | 1 | 0 | 1 | 0 | 0 |
| PBE-Strings | 1 | 0 | 1 | 0 | 0 |
| PBE-BV | 1 | 0 | 1 | 0 | 0 |
| Compiler Optimizations and Bit Vectors | Let and Motion Planning | Invariant Generation with Bounded Ints | Invariant Generation with Unbounded Ints | Multiple Functions | Arrays | Hackers Delight | Integers | Program Repair | ICFP | Cryptographic Circuits | Instruction Selection | Total | ||
| Number of benchmarks | 32 | 30 | 28 | 28 | 32 | 35 | 69 | 34 | 18 | 50 | 214 | 28 | 598 | |
| Solved | 16 | 17 | 24 | 24 | 13 | 31 | 62 | 34 | 17 | 50 | 160 | 0 | 448 | |
| 16 | 10 | 24 | 23 | 18 | 31 | 53 | 33 | 14 | 50 | 148 | 0 | 420 | ||
| 15 | 15 | 24 | 24 | 12 | 31 | 62 | 34 | 17 | 48 | 116 | 0 | 398 | ||
| Fastest | 15 | 15 | 22 | 24 | 9 | 31 | 59 | 33 | 16 | 23 | 119 | 0 | 366 | |
| EUSolver | 13 | 1 | 12 | 11 | 14 | 5 | 29 | 15 | 12 | 45 | 109 | 0 | 266 | |
| 12 | 9 | 16 | 14 | 9 | 24 | 60 | 33 | 6 | 20 | 49 | 0 | 252 | ||
| Uniquely | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 7 | 0 | 12 | |
| EUSolver | 3 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
SyGuS-Comp 2018: Results and Analysis
Rajeev Alur University of Pennsylvania, USA [email protected] Ben-Gurion University, IsraelUniversity of California,
Los Angeles, USAGoogle Brain, USAMicrosoft, USA
Dana Fisman Ben-Gurion University, Israel [email protected] University of California,
Los Angeles, USAGoogle Brain, USAMicrosoft, USA
Saswat Padhi University of California,
Los Angeles, USA [email protected] Google Brain, USAMicrosoft, USA
Rishabh Singh Google Brain, USA [email protected] Microsoft, USA
Abhishek Udupa Microsoft, USA [email protected]
Abstract
Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation that meets both a semantic constraint given by a logical formula in a background theory , and a syntactic constraint given by a grammar , which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF), a language that is built on top of SMT-LIB.
The syntax-guided synthesis competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In the 5th SyGuS-Comp, five solvers competed on over 1600 benchmarks across various tracks. This paper presents and analyses the results of this year’s (2018) SyGuS competition.
1 Introduction
The Syntax-Guided Synthesis competition (SyGuS-Comp) is an annual competition aimed to provide an objective platform for comparing different approaches for solving the syntax-guided synthesis problem. A SyGuS problem takes as input a logical specification for what a synthesized function should compute, and a grammar providing syntactic restrictions on the implementation for the function to be synthesized. Formally, a solution to a SyGuS instance is a function that is expressible in the grammar such that the formula obtained on replacing by in the logical specification is valid. SyGuS instances are formulated in SyGuS-IF [10], a format built on top of SMT-LIB2 [5].
We report here on the 5th SyGuS competition that took place in July 2018, in Oxford, UK as a satellite event of CAV’18 (The 30th International Conference on Computer-Aided Verification) and SYNT’18 (The 7th Workshop on Synthesis). As in the previous year’s competition, there were four tracks: the general track, the conditional linear integer arithmetic (CLIA) track, the invariant synthesis (Inv) track, and the programming by examples (PBE) track. We assume that most readers of this report would already be familiar with the SyGuS problem and the SyGuS-Comp tracks, and thus refer unfamiliar readers to the report on last year’s competition [3].
The report is organized as follows.
- •
Section 2 briefly describes the benchmarks and the key idea behind the submitted solvers.
- •
Section 3 provides details on the experimental setup.
- •
Section 4 gives an overview of the results per track.
- •
Section 5 provides details on the results, given from a single benchmark perspective.
- •
Section 6 concludes the report with some key takeaway points.
2 Submitted Benchmarks and Solvers
In addition to the benchmarks from the last year’s competition, we received over new benchmarks this year, across various competition tracks, which we summarize below in Table 1.
Five solvers were submitted to this year’s competition:
(1) , an improved version of CVC4,
(2) DryadSynth, a solver specialized for conditional linear integer arithmetic,
(3) , an improved version of EUSolver,
(4) Horndini, a solver specialized for constrained horn clauses (CHCs) and
(5) LoopInvGen, a solver specialized for invariant generation problems.
Table 2 lists the submitted solvers along with their authors, and Table 3 shows the tracks in which each solver participated.
The solver is based on an approach for program synthesis that is implemented inside an SMT solver [11]. This approach extracts solution functions from unsatisfiability proofs of the negated form of synthesis conjectures, and uses counterexample-guided techniques for quantifier instantiation (CEGQI) that make finding such proofs practically feasible. also combines enumerative techniques, and symmetry breaking techniques [12].
The DryadSynth solver combines enumerative and symbolic techniques. It considers benchmarks in conditional linear integer arithmetic theory (LIA), and can therefore assume all have a solution in some pre-defined decision tree normal form. It then tries to first get the correct height of a normal form decision tree, and then tries to synthesize a solution of that height. It makes use of parallelization, using as many cores as are available, and of optimizations based on solutions of typical LIA SyGuS problems.
The solver uses a divide-and-conquer strategy [2] to find different expressions that satisfy different subsets of the input space, and then unifies them into a solution that works well for the entire space of inputs. Subexpressions are typically found using enumeration techniques and are then unified into the final solutions using decision tree learning [4].
The Horndini solver extends the classical IC3 decision-tree algorithm with the Horn implication counterexamples (Horn-ICE) framework [6], which extends the ICE-learning model. The authors describe a decision-tree learning algorithm that learns from Horn-ICE samples, works in polynomial time, and uses statistical heuristics to learn small trees that satisfy the samples.
The LoopInvGen solver [9] for invariant synthesis extends the data-driven approach to inferring sufficient loop invariants from a set of program states [8]. Previous approaches to invariant synthesis were restricted to using a fixed set, or a fixed template for features, e.g., ICE-DT [7] requires the shape of constraints (such as octagonal) to be fixed apriori. Instead LoopInvGen starts with no initial features, and automatically learns features as necessary using program synthesis. It reduces the problem of loop invariant inference to a series of precondition inference problems, and uses a counterexample-guided inductive synthesis (CEGIS) loop to revise the current candidate.
3 Experimental Setup
The solvers were run on the StarExec platform [13] with a dedicated cluster of 12 nodes, where each node consisted of two 4-core 2.4 GHz Intel processors with 256 GB RAM and 1 TB hard-disk space. The memory usage limit for each solver run was set to 128 GB, and the wall-clock time limit is set to 3600 seconds (thus, a solver that used all 4 cores could consume at most 14400 seconds of CPU time). The solutions that the solvers produced were checked for both syntactic and semantic correctness. That is, a first postprocessor checked that the produced expression adhered to the grammar specified in the given benchmark, and if this check passes, a second postprocessor checked that the solution adhered to semantic constraints given in the benchmark (by invoking an SMT solver).
4 Results Overview
The primary criterion for winning a track was the number of benchmarks solved, but we also analyzed the time to solve and the the size of the generated expressions. The overall score for each solver was computed as . Here denotes the number of benchmarks solved by the solver, denotes the number of benchmarks solved among the fastest, and denotes the number of benchmarks for which the size of the generated solution was among the shortest. We used a pseudo-logarithmic scale for and . For time to solve, the scale is: , , , , , , , , . That is, the first “bucket” refers to termination in less than one second, the second to termination in one to three seconds and so on. We say that a solver solved a certain benchmark among the fastest if the time it took to solve that benchmark is in the same bucket as that of the solver which solved that benchmark in minimum time. Similarly, for expression sizes, the pseudo-logarithmic scale we use is: , , , , , , where expression size is the number of nodes in the SyGuS parse-tree. We also report on the number of benchmarks solved uniquely by a solver, i.e. the number of benchmarks which no solver other than the particular solver could solve.
In Figure 1, we show the number of benchmarks solved, the number of benchmarks solved among the fastest, and the number of synthesized expressions among the smallest size; per solver per track.
General Track
In the general track, solved the most number of benchmarks (), and came second, solving . We note that the new version is significantly better than the previous version , which could only solve benchmarks. The same order appears in the number of benchmarks solved among the fastest: with , with , and with . Finally, we note that is able to solve benchmarks that no other solver could solve, and similarly there are benchmarks that only EUSolver could solve.
We partitioned the benchmarks of the general track to a number of categories, each containing a set of related benchmarks. The results per category are given in the Table 4. We observe that preformed significantly better in the “Multiple Functions” and “ICFP” categories. While the solver preformed better in the other categories, none of the solvers could solve any of the benchmarks from the “Instruction Selection” category.
Conditional Linear Arithmetic Track
In the CLIA track, and DryadSynth had a close competition. solved out of benchmarks, DryadSynth solved benchmarks, and solved benchmarks. In terms of the time to solve, DryadSynth solved benchmarks among the fastest, solved , followed by which solved among the fastest. There were two benchmarks that were solved uniquely by DryadSynth, and one that was solved uniquely by .
Invariant Generation Track
In the invariant generation track, the LoopInvGen solver solved out of benchmarks, solved , DryadSynth solved , Horndini solved and solved benchmarks. In terms of the time to solve, LoopInvGen solved benchmarks among the fastest, followed by which solved , DryadSynth which solved , Horndini which solved , and which solved . There was one benchmark that was solved by a unique solver – the fib17n.sl benchmark solved by LoopInvGen.
Programming By Example (Bit Vectors) Track
In the PBE track on the theory of bit vectors, the solver solved out of benchmarks and solved benchmarks. In terms of the time to solve, solved benchmarks among the fastest, and solved . However, generates shorter expressions than in significantly many cases. There were four benchmarks that were solved uniquely by , and one benchmark that was solved uniquely by .
Programming By Example (Strings) Track
In the PBE track on the theory of strings, the solver solved out of benchmarks, and solved benchmarks. In terms of the time to solve, solved benchmarks among the fastest, and solved . We note again that generates shorter expressions than in several cases. There were benchmarks that were solved uniquely by .
5 Detailed Results
In this section we show the results of the competition from the benchmark’s perspective. For a given benchmark we would like to know:
(1) how many solvers solved it
(2) what are the minimum and maximum times required to solve
(3) what are the minimum and maximum sizes of solutions generated
(4) which solver solved the benchmark the fastest, and
(5) which solver produced the smallest expression.
We present the results in groups organized per track and category. For instance, the top plot in Figure 6 presents the details for program repair benchmarks from the general track. The black bars above the -axis show the range of time taken to solve across the various solvers, in our pseudo logarithmic scale. Inspect for instance benchmark t2.sl. The black bar indicates that the fastest solver takes less than second, and the slowest one takes between to seconds. The black number above the black bar indicates the exact number of seconds (floor-rounded to the nearest second) it took the slowest solver to solve a benchmark (and if at least one solver exceeded the time bound). Thus, we can see that for t2.sl, the slowest solver took seconds. The white number at the lower part of the bar indicates the time taken by the fastest solver. Thus, we can see that for t2.sl, the fastest solver required less than second. The colored squares/rectangles below the black bar indicate which solvers were among the fastest to solve that benchmark (according to the solvers’ legend at the top). For instance, we can see that and were the fastest to solve t2.sl, solving it in less than a second, and that among the solvers that solved t4.sl only solved it in less than a second.
Similarly, the gray bars below the -axis indicate the range of expression sizes in pseudo-logarithmic scales, where the size of an expression is determined by the number of nodes in its parse tree. The black number at the bottom of the gray bar indicates the exact size of the largest solution (or if it exceeded ), and the white number at the top of the gray bar indicates the exact size of the smallest solution. When the smallest and largest size of expressions are in the same pseudo-logarithmic bucket, as is the case in t2.sl), we indicate the expression size only in black. The colored squares/rectangles above the gray bar indicate which solvers were amongst the ones that produced the smallest expression (according to the solvers’ legend at the top). For instance, for t20.sl the smallest expression produced had size , which is produced only by .
Finally, the top -axis indicates the number of solvers that solved a particular benchmark. For instance, in Figure 6, only one solver solved t6.sl, two solvers solved t14.sl, three solvers solved t2.sl, and no solver solved t13.sl. Note that for the benchmarks that no solver is able to solve, the black bars indicate the range of time taken by solvers to terminate. When no solver produces a correct result, there are no colored squares/rectangles below the black bars, as is the case for t13.sl.
6 Summary
This year’s competition consisted of over benchmarks, of which where contributed this year. Five solvers competed this year, one of which was submitted by developers creating a tool for SyGuS-Comp for the first time. All tools preformed remarkably well, on both existing and new benchmarks. In particular, more than % of the current set of benchmarks from the general track are now solved. However, there are several classes of problems that are still challenging for the current solvers, especially the “Instruction Selection” and “Multiple Functions” categories; and we hope the developers would continue to improve their SyGuS techniques and advance the state of the art.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[2] Rajeev Alur, Pavol Cerný & Arjun Radhakrishna (2015): Synthesis Through Unification . In: Computer Aided Verification - 27th International Conference, CAV, Proceedings, Part II , pp. 163–179.
- 3[3] Rajeev Alur, Dana Fisman, Rishabh Singh & Armando Solar-Lezama (2017): Sy Gu S-Comp 2017: Results and Analysis . In: Proceedings of the Sixth Workshop on Synthesis, SYNT@CAV , EPTCS 260, pp. 97–115.
- 4[4] Rajeev Alur, Arjun Radhakrishna & Abhishek Udupa (2017): Scaling Enumerative Program Synthesis via Divide and Conquer . In: Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS, Proceedings, Part I , pp. 319–336.
- 5[5] Clark Barrett, Aaron Stump & Cesare Tinelli: The SMT-LIB Standard – Version 2.0 .
- 6[6] P. Ezudheen, Daniel Neider, Deepak D’Souza, Pranav Garg & P. Madhusudan (2018): Horn-ICE Learning for Synthesizing Invariants and Contracts . PACMPL 2(OOPSLA), pp. 131:1–131:25.
- 7[7] Pranav Garg, Daniel Neider, P. Madhusudan & Dan Roth (2016): Learning Invariants using Decision Trees and Implication Counterexamples . In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL , pp. 499–512.
- 8[8] Saswat Padhi, Rahul Sharma & Todd D. Millstein (2016): Data-driven precondition inference with learned features . In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI , pp. 42–56.
