Learning-Based Synthesis of Safety Controllers
Daniel Neider, Oliver Markgraf

TL;DR
This paper introduces a machine learning framework for synthesizing reactive safety controllers in complex systems modeled by infinite-duration games, leveraging a novel decision tree learning algorithm to efficiently approximate winning regions.
Contribution
It presents a new decision tree learning approach for safety controller synthesis in infinite and large finite graphs, with convergence guarantees and empirical performance evaluation.
Findings
The decision tree algorithm guarantees convergence to a safety controller.
The framework effectively handles infinite and large finite graphs.
Empirical results show competitive performance against existing methods.
Abstract
We propose a machine learning framework to synthesize reactive controllers for systems whose interactions with their adversarial environment are modeled by infinite-duration, two-player games over (potentially) infinite graphs. Our framework targets safety games with infinitely many vertices, but it is also applicable to safety games over finite graphs whose size is too prohibitive for conventional synthesis techniques. The learning takes place in a feedback loop between a teacher component, which can reason symbolically about the safety game, and a learning algorithm, which successively learns an overapproximation of the winning region from various kinds of examples provided by the teacher. We develop a novel decision tree learning algorithm for this setting and show that our algorithm is guaranteed to converge to a reactive safety controller if a suitable overapproximation of the…
| DT-Synth | SAT-Synth | RPNI-Synth | CONSYNTH | |||||||
| Game | Time in s | Iter. | Size | Time in s | Iter. | Size | Time in s | Iter. | Size | Time in s |
| Box | ||||||||||
| Box Limited | ||||||||||
| Diagonal | ||||||||||
| Evasion | ||||||||||
| Follow | — | |||||||||
| Solitary Box | ||||||||||
| Square | ————————— | — | ||||||||
| Cinderella () | ————————— | ————————— | ————————— | — | ||||||
| Cinderella () | ————————— | ————————— | ————————— | |||||||
| Program-repair | ||||||||||
| Repair-critical | ————————— | |||||||||
| Synth-Synchronization | ————————— | |||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learning-Based Synthesis of Safety Controllers
Daniel Neider
Max Planck Institute for Software Systems
Kaiserslautern, Germany
Email: [email protected]
Oliver Markgraf
Max Planck Institute for Software Systems
Kaiserslautern, Germany
Abstract
We propose a machine learning framework to synthesize reactive controllers for systems whose interactions with their adversarial environment are modeled by infinite-duration, two-player games over (potentially) infinite graphs. Our framework targets safety games with infinitely many vertices, but it is also applicable to safety games over finite graphs whose size is too prohibitive for conventional synthesis techniques. The learning takes place in a feedback loop between a teacher component, which can reason symbolically about the safety game, and a learning algorithm, which successively learns an approximation of the winning region from various kinds of examples provided by the teacher. We develop a novel decision tree learning algorithm for this setting and show that our algorithm is guaranteed to converge to a reactive safety controller if a suitable approximation of the winning region can be expressed as a decision tree. Finally, we empirically compare the performance of a prototype implementation to existing approaches, which are based on constraint solving and automata learning, respectively.
I Introduction
Reactive synthesis offers an effective and promising way to solve a crucial practical problem: constructing correct and verified controllers for safety-critical systems. Rather than designing and implementing controllers by hand, reactive synthesis techniques construct controllers in an automatic fashion, thus, freeing engineers from this complex and error-prone task. In addition to being fully automatic, synthesis techniques produce correct-by-construction controllers that guarantee to satisfy the given specification, or they report that no such controller exist.
Typically, reactive synthesis is modeled as an infinite-duration game on a graph that is played by two antagonistic players: the system, which seeks to satisfy the specification, and the environment, which wants to violate it. More precisely, the specification and a model of the environment are in a first step converted into an infinite game. Then, one computes a winning strategy for the system, which prescribes how the system needs to play in order to win against every move of the environment. Finally, the winning strategy is translated into hard- or software, resulting in a reactive controller that satisfies the given specification.
In this paper, we focus on safety games, a class of infinite games that arises from safety specifications. Such specifications are in fact among the most important in practice (e.g., see Dwyer, Avrunin, and Corbett [1] for a survey of specification patterns) and capture many other interesting properties, including bounded-horizon reachability. In contrast to the classical setting, however, we consider safety games not only over finite graphs but also over graphs with infinitely many (even uncountably many) vertices. Such games arise naturally, for instance, when the interaction between the controlled system and its environment is too complex to be modeled by finite graphs (e.g., in motion planning over unbounded environments) or when the environment has access to dynamic data structures, such as lists, stacks, or queues.
When the number of vertices of a game graph is infinite, traditional methods, which typically rely on an explicit exploration of the (whole) graph, are no longer applicable. To enable the computation of winning strategies for such games, our first contribution is a machine learning framework that constructs winning strategies by learning a proxy object called winning set. Intuitively, a winning set is an approximation of the vertices from which the system player can win and that permits to extract a winning strategy in a simple manner.
Our learning framework defines a feedback loop, akin to counterexample-guided inductive synthesis (CEGIS) [2], consisting of two entities: a teacher, who can reason symbolically about the game, and a learning algorithm, whose goal is to learn a winning set from information provided by the teacher. In every iteration of the loop, the learning algorithm constructs a set of vertices, which it proposes to the teacher. The teacher, on the other hand, checks whether the proposed set is a winning set and stops the learning process if so. If this is not the case, the teacher returns a counterexample. Upon receiving a counterexample, the learner refines its conjecture and proceeds with the next iteration.
Motivated by recent success in using decision trees as a concise and effective representation of strategies in infinite games [3], our second contribution is a new learning algorithm for decision trees, which is tailored specifically to the framework sketched above. Our algorithm builds upon a recent learning algorithm for decision trees that has been proposed in the context of software verification [4] and that puts a strong emphasis on learning “small” trees. As a consequence of the latter, our algorithm can in many situations guarantee to learn a winning set if one can be expressed as a decision tree.
Even though a game graph is infinite or prohibitively large, a reactive controller with a compact representation might already realize the specification. In a motion planning scenario, for instance, the system often only needs to consider a small subset of possible interactions with the environment to satisfy the specification. Based on this observation, our learning-based approach possesses various desirable properties:
(i) it leverages machine learning as an effective means to focus on the important parts of a game,
(ii) it often learns “small” winning sets (and, by extension, small winning strategies) in a rule-like format, which tend to be relatively easy for humans to understand (see Brázdil et al. [3]), and
(iii) besides operating over infinite graphs, it guarantees in many situations to learn a winning set if one exists.
In addition, we demonstrate empirically that our approach is highly competitive to existing tools on two sets of benchmarks taken from the literature.
Related Work
Games over various types of infinite graphs have been studied, predominantly in the context of pushdown graphs [5]. For more general classes of graphs, a constraint-based approach [6], relying on constraint solvers such as Z3 [7], and various learning-based approaches have been proposed [8, 9] (we discuss these approaches in Section V). In the context of safety games over finite graphs, recent work [10] has demonstrated the ability of learning-based techniques to extract small controllers from precomputed controllers with a potentially large number of states.
Our learning framework is an extension of an earlier framework by Neider and Topcu [9]. Their work considers so-called rational safety games (defined in terms of finite automata) and proposes an automaton learning approach to infer winning strategies. By contrast, we do not fix a specific representation of the game graph and only require that certain operations can be performed symbolically. Many common formalisms such as finite automata, various types of decision diagrams, as well as formulas in the first-order theories of linear integer and real arithmetic satisfy these requirements. However, we consider only finitely-branching game graphs, whereas Neider and Topcu also consider graphs with infinite branching.
The algorithm we design for learning decision trees builds on top of a learning algorithm recently proposed by Ezudheen et al. [4], which learns from data in form of Horn clauses. For this setting, other learning algorithms have been developed as well [11, 12]. We have chosen Ezudheen et al.’s algorithm specifically for its property to guarantee convergence to a solution in many practical scenarios.
II Controller Synthesis and Safety Games
We follow the game-theoretic approach to controller synthesis as popularized by McNaughton [13] and view the problem as an infinite-duration, two-player game on a directed graph. Such games are played by two antagonistic players: Player [math], who embodies the system, and Player , who embodies the environment. In this setting, the type of specification dictates the type of game, and a winning strategy for Player [math] translates immediately into a controller that satisfies the given specification (we refer the reader to Grädel, Thomas, and Wilke [14] for a comprehensive discussion of this connection). Since we are interested in synthesizing controllers from safety specifications, the remainder of this paper is concerned with so-called safety games. However, before we introduce these types of games formally, let us first fix basic notations.
Basic Notations
Let denote the set of Boolean values ([math] representing and representing ), the set of natural numbers, the set of integers, and the set of real numbers. Given a set , we denote the set of all finite sequences of elements of by and the set of all infinite sequences by . Moreover, for a binary relation and two sets , the image of under is the set and the preimage of under is the set .
Safety Games
Our definitions and notations mainly follow those of Grädel, Thomas, and Wilke [14], and we refer the reader to this textbook for further details on infinite-duration games. Formally, a safety game is a five-tuple consisting of two disjoint sets , of vertices controlled by Player [math] and Player , respectively (we denote their union by and assume ), a directed edge relation , a nonempty set of initial vertices, and a set of safe vertices. The directed graph is typically called game graph. In contrast to the classical setting, we do not restrict to be finite but allow even uncountable sets. However, we do make the following two restrictions to the edge relation: we assume that
(1) every vertex has at least one outgoing edge (i.e., for each ), and
(2) is finite for every , though not necessarily bounded.
Note that the first restriction is standard and simply avoids situations in which the game gets stuck. The second restriction, on the other hand, is required by our learning framework and ensures that the data to learn from is always a finite object.
A safety game is played in rounds: initially, a token is placed on one of the initial vertices ; in each round, the player controlling the current vertex then moves the token to the next vertex along one of the outgoing edges. This process of moving the token is repeated ad infinitum and results in an infinite sequence with and for every , which is called a play. The winner of a play is determined by the winning condition in that a play is winning for Player [math] if for every —otherwise it is winning for Player .
In the framework of infinite games, synthesizing a controller amounts to computing a so-called winning strategy for Player [math], which prescribes how Player [math] needs to move in order to win a play. Formally, a strategy for Player [math] is a function such that for every . A strategy is called winning if every play that is played according to (i.e., that satisfies for all with ) is winning for Player [math]. It is well known that safety games permit memoryless winning strategies where the choice of the next vertex depends only on the vertex the play has currently reached. Such a strategy can then easily be implemented as a controller: the controller tracks the current vertex of a play and chooses the next move according to the strategy. Hence, the objective in the remainder of this paper is to compute a memoryless winning strategy for Player [math]. We refer to this as solving a game.
If the game graph underlying a safety game is finite, memoryless winning strategies can be computed in linear time using a simple fixed-point computation [14]. For infinite game graphs, on the other hand, this is no longer an option as a fixed-point computation might not converge in finite time. To overcome this problem, we propose a novel learning-based approach that learns a (memoryless) winning strategy via a proxy object named winning set.
Winning Sets
Intuitively, a winning set is a subset of the safe vertices that contains all initial vertices and is a trap for Player (i.e., Player [math] can force any play to stay inside this set regardless of how Player plays). Formally, we define winning sets as follows.
Definition 1** (Winning set).**
*Let be a safety game. A winning set is a set satisfying
(1) ,
(2) ,
(3) for all (existential closedness), and
(4) for all (universal closedness).
A winning set immediately provides a winning strategy for Player [math]: starting in , Player [math] simply moves to a (fixed) successor vertex inside whenever it is his turn (note that this is possible since is existentially closed). As is also universally closed, a straightforward induction over the length of plays proves that every play that starts inside and is played according to this strategy stays inside , no matter how Player plays. Thus, Player [math] wins since .
In the remainder, we encourage the reader to think of a winning set as a (symbolic) representation of a winning strategy. The following example, inspired by robotic motion planning, illustrates the concept of winning sets.
Example 1**.**
Consider a robot in an unbounded, one-dimensional grid world as depicted in Figure 1. Moreover, let us assume that the robot’s position, indicated by an “x” in Figure 1, can be modeled as a single real-valued coordinate , allowing for an uncountable number of positions.
The robot’s movement is controlled by two players, the system (Player [math]) and the environment (Player ). To decide which player is currently in control of the robot, we divide the world in infinitely many intervals with : if the robot resides inside an interval with even, then Player [math] is in control; otherwise, Player is in control. In every turn, we allow the system or the environment to move the robot one unit to the left (i.e., decreasing its position by one) or one unit to the right (i.e., increasing its position by one). Note that this implies that the system and the environment are taking turns controlling the robot. We initially place the robot at an arbitrary position and define all positions to be safe. The objective of the system is to stay inside these safe positions at all times.
We can model the above situation as a safety game as follows. The set of Player [math] vertices is , while the set of Player vertices is ; note that . The set of safe vertices is and the set of initial vertices is . Finally, the edge relation is defined as .
*An example of a winning set is . It clearly contains the set of initial vertices. Furthermore, if the position of the robot is in the interval or , Player [math] can stay inside by moving the robot into the interval . Similarly, if the robot is in the interval , Player has no choice but to move the robot either into the interval or into , thus staying inside . Since is also contained in , it is in fact a winning set. As sketched above, a winning strategy is easy to derive. *
Finally, let us note that safety games as defined above subsume games on finite graphs, but one needs to choose a suitable symbolic representation in order to handle (un-)countably infinite game graphs. We comment on this in detail after having introduced our machine learning framework in the next section.
III A Machine Learning Framework for Synthesizing Safety Controllers
We now propose a machine learning framework for learning winning sets in safety games and, thus, reactive safety controllers. Our framework is a generalization of earlier work by Neider and Topcu [9], which encodes (countably) infinite game graphs using finite automata and uses automaton learning to learn winning sets. By contrast, the framework proposed here allows for game graphs with uncountably many vertices and is not restricted to a symbolic representation in terms of finite automata.
As illustrated in Figure 2, the learning takes place in a counterexample-guided feedback loop (CEGIS) [2] with two entities: a teacher, who has knowledge about the safety game, and a learner (or learning algorithm), whose objective is to learn a winning set, but who is agnostic to the game. In every iteration of the loop, the learner conjectures a set , called hypothesis, based on the information about the game it has accumulated so far. Then, the teacher checks whether this set is in fact a winning set—queries of this type are often called equivalence or correctness queries. Although the teacher does not know a winning set (the task is to learn one after all), it can verify whether the hypothesis is one by checking Conditions (1) to (4) of Definition 1. If the hypothesis satisfies these conditions, then is a winning set and the learning stops. If this is not the case, the teacher replies with a counterexample that witnesses the violation of one of these conditions. Then, the feedback loop continues until a winning set has been found. The definition below formalizes the concept of counterexamples and fixes the communication between the teacher and the learner.
Definition 2** (Teacher for Safety Games).**
Let be a safety game. Given a hypothesis , the teacher replies as follows (whereby the order in which the checks are performed is arbitrary):
If , then the teacher returns a positive counterexample . 2. 2.
If , then the teacher returns a negative counterexample . 3. 3.
If there exists a with , then the teacher returns an existential counterexample with . 4. 4.
If there exists a with , then the teacher returns a universal counterexample with .
If passes all four checks, then the teacher returns “yes”.
Each case of Definition 2 corresponds to one condition of Definition 1. Thus, it is not hard to verify that a given hypothesis is in fact a winning set if the teacher replies “yes” (as it satisfies Definition 1). Counterexamples, on the other hand, witness the violation of one of these conditions and guide the learner towards a winning set by communicating exactly why the hypothesis is incorrect. For instance, the meaning of a positive counterexample is that any future hypothesis needs to include this vertex (as it is initial), whereas a negative counterexample must be excluded (as it is not a safe vertex). An existential counterexample signals that the hypothesis is not existentially closed and requires that if a future hypothesis contains , it also needs to contains an least one vertex of the vertices . Similarly, a universal counterexample signals that the hypothesis is not universally closed and requires that if a future hypothesis contains , it needs to contains all vertices . Note that existential and universal counterexamples are always finite objects since we assume to be finite for every .
Let us illustrate the overall learning process with an example.
Example 2**.**
We continue Example 1. All hypotheses produced in the course of the learning process are depicted in Figure 3. Gray shaded areas indicate vertices in the hypothesis, safe vertices are surrounded by a bold line, and initial vertices are indicated by diagonal lines.
Let us assume that the learner proposes the hypothesis in the first iteration of the loop, which is shown in Figure 3(a). Since this hypothesis does not include any initial vertex, it does not satisfy Condition (1) of Definition 1. Thus, the teacher returns a positive counterexample, say .
Next, suppose that the learner proposes the hypothesis , shown in Figure 3(b). This hypothesis is consistent with the positive counterexample. However, it also includes unsafe vertices and, thus, does not satisfy Condition (2) of Definition 1. Hence, the teacher replies with a negative counterexample, say .
Let us now assume that the learner conjectures , depicted in Figure 3(c). This conjecture is consistent with both the positive and the negative counterexample. However, it is not existentially closed because Player [math] has no choice but to move the robot outside of . Therefore, the teacher replies with an existential counterexample, say .
Suppose now that the learner proposes the hypothesis in the fourth iteration, shown in Figure 3(d). Although this conjecture is consistent with all counterexamples received so far, it is not universally closed because Player can move the robot into the interval , which is not included in . Thus, the teacher returns a universal counterexample, say .
*Finally, the learner proposes the hypothesis , depicted in Figure 3(e). This hypothesis satisfies all conditions of Definition 2. Thus, is a winning set and the learning terminates. *
Our learning framework is straightforward to implement if the underlying game graph is finite, in which case the teacher can be built on top of an explicit representation of the game. However, if the underlying game graph becomes too large or is infinite, one has to choose a suitable representation for sets of vertices and the edge relation that allows performing operations on the graph symbolically. More precisely, the chosen symbolic representation must feature Boolean operations (i.e., union, intersection, and complementation), and the image and preimage of symbolically represented sets need to be computable. Moreover, the emptiness problem (i.e., “given a set , decide whether ”) needs to be decidable, and it must be possible to extract an element from if it is nonempty. Examples of such symbolic representations include many common formalisms such as finite automata, various types of decision diagrams, and first-order formulas in linear integer arithmetic and real arithmetic.
Furthermore, hypotheses must also be expressible in the chosen symbolic formalism. However, to build efficient learning algorithms, it is often necessary to restrict the class of hypotheses—typically called the hypothesis space and denoted by —even further. Examples of such restricted hypothesis spaces are conjunctive formulas (e.g., as used by the popular Houdini algorithm [15]) or decision trees, which are common, for instance, in learning-based software verification [11, 16, 17, 12] and also used in this work. Note that it might happen that a winning set exists, though it cannot be expressed as a hypothesis in the hypothesis space. Thus, the choice of the hypothesis space and, hence, the learning algorithm needs to be made carefully in order for the learning to succeed.
A second important property of learning algorithms is what we call “consistency”. To make this notion precise, let us assume that the learner accumulates counterexamples in a so-called game sample consisting of a finite set of positive counterexamples, a finite set of negative counterexample, a finite set of existential counterexamples, and a finite set of universal counterexamples. Then, we say that a hypothesis is consistent with a game sample if
(1) for each ,
(2) for each ,
(3) implies for each , and
(4) implies for each .
Moreover, we call a learner consistent if it always produces a consistent hypothesis. Consistency is an important property as it prevents the learner from making the same mistake twice and ensures progress towards a winning set.
In fact, the notion of consistency allows us to show that our framework is sound in the sense that any consistent learner learns a winning set in the limit if one exists in the chosen hypothesis space. This is formalized in the next theorem.
Theorem 1**.**
Let a teacher for a safety game (as described in Definition 2) and a consistent learner over an hypothesis space be given. If there exists a winning set expressible as a hypothesis in , then there exists an ordinal , where denotes the class of all ordinals, such that the learner proposes a winning set after at most iterations.
Theorem 1 is a consequence of the fact that our learning framework is in instance of an abstract learning framework for synthesis (ALF), as introduced by Löding et al. [18]. The proof roughly proceeds as follows. Since the teacher of Definition 2 allows “progress” (i.e., every counterexample refutes the current hypothesis) and we assume the learner to be consistent, the learner never conjectures the same hypothesis twice. In the worst case, the learner will have exhausted all incorrect hypotheses after iterations for an ordinal with cardinality less or equal to . Since the teacher is also “honest” (i.e., it does not return spurious counterexamples), the learner necessarily produces a winning set in the subsequent iteration if one exists.
Finally, let us point out that the safety games as defined in Section II are very general and even allow encoding computations of Turing machines. Consequently, determining the winner of such safety games is undecidable in general, and any algorithm for computing winning sets can be a semi-algorithm at best (i.e., an algorithm that, on termination, gives the correct answer but does not guarantee to halt). The algorithm we design in the next section is of this kind.
IV Learning Decision Trees from Game Samples
We now fix the hypothesis space to be the class of all decision trees (as defined shortly) and describe an algorithm to learn consistent decision trees from game samples. Our algorithm builds on top of a learning algorithm recently proposed by Ezudheen et al. [4] in the context of software verification. To ease the presentation in this section, we abstract from the setting of infinite games and assume that the data to learn from is taken from an abstract domain , whose elements we call data points. We encourage the reader to think of data points and vertices as synonyms and define concepts such as game samples and consistency analogously for data points.
An example of a decision tree is shown in Figure 4. In general, decision trees are binary trees whose inner nodes are labeled with predicates from an a priori fixed set and whose leaves are labeled with Boolean values. In this context, each predicate is a function that maps data points to a Boolean values and corresponds to a property of interest. Typically, the set is finite, but Ezudheen et al.’s algorithm can build decision trees even from infinite sets of predicates if the underlying domain is numeric. The algorithm we design in this section retains this feature.
Example 3**.**
Consider Example 1 from Section III. Remember that in this game a robot moves in an unbounded, one-dimensional grid world. Both players can move the robot in an horizontal direction by one cell and are taking turns controlling the robot. Player [math]’s objective is to keep the robot inside or right of the cell at position [math]. An illustration of this game is shown in Figure 1, where the initial starting position of the robot is shown at .
A possible encoding uses two variables , as described next. The variables corresponds to the -coordinate, while the variable indicates which player is currently in control of the robot. In the case of , Player [math] is in control, thus . On the other hand, if , then Player is in control of the robot, thus . Since the starting position is arbitrary within the interval which is controlled by Player [math], which means that the set of initial vertices is . Furthermore, we fixed the set of safe vertices to be at position greater or equal [math], thus . Finally, it is left to define the edge relation. The robots movement to the right, for instance, can be defined by the relation . The direction to the left can be encoded analogously, and the edge relation is the union of both directions.
Figure 5 depicts the set of Example 2. We first observe that the predicate at the root node is . Since determines the position of the robot, the left subtree (i.e., where is true) encodes vertices with position , while the right subtree (i.e., where is false) encodes vertices with position .
Let us first consider the left subtree. There is only one path leading to the unique leaf node labeled with [math]. All of these vertices are not safe and correspond to the white area left of position in Figure 3(e). Conversely, only the vertices satisfying and lead to the unique leaf node labeled with in the right subtree. This corresponds to the gray-shaded area in Figure 3(e). All of these vertices are safe. It is not hard to verify that Player [math] can stay within these sets, whereas Player cannot force a play to an outside vertex. Moreover, the set of initial vertices is included. Thus, the learned decision tree in fact encodes a winning set.
Note that in the definition of the winning set requires to hold. However, this is not true in this example since the decision tree evaluates to even in situations where , while contains only vertices with . Note that this is not a problem in our example since the only checks that require to hold true are the checks for existential and universal closedness. Those checks require the winning set to be intersected with either or , which effectively ignores all vertices that violate .
In conclusion, the winning set depicted in Figure 3(e) consists of the 3 cells in the interval . The decision tree of Figure 5 describes this set in an easy to understand manner.
In the remainder of this section, we view a decision tree as a representation of an (infinite) set of data points. Whether a data point belongs to this set depends on its valuation , which is defined as follows: starting at the root node, we recursively descend left (right) if satisfies (does not satisfy) the predicate at the current node and define to be the label of the leaf node that is ultimately reached by this procedure. The set of data points represented by is then simply the set . Since the learner proposes this set as a hypothesis to the teacher, we call a decision tree consistent with a game sample if is consistent with .
Ezudheen et al.’s algorithm has been developed in the context of software verification and expects so-called Horn samples as input. Formally, a Horn sample is a finite set containing Horn constraints of the form or where and are data points. Note that the left-hand-side of a Horn constraint might be empty (i.e., ), in which case it is interpreted as .
Given a Horn sample , Ezudheen et al.’s algorithm learns a decision tree that is consistent with in the sense that
(1) implies for each Horn constraint of the form in and
(2) for each Horn constraint of the form in ;
by extension, we then also say that is consistent with . Should no consistent decision tree exist (e.g., because the set does not contain sufficient predicates to separate positive and negative examples), the algorithm aborts with an error. The theorem below summarizes these key properties.
Theorem 2** (Ezudheen et al. [4]).**
Let be a Horn sample and a finite set of predicates, both over the domain . Moreover, let be the number of data points and the number of Horn constraints in . If a decision tree over exists that is consistent with , then Ezudheen et al.’s algorithm learns one in time , assuming that predicates can be evaluated in constant time.
We are now ready to present our algorithm for learning decision trees from game samples, which is shown in pseudo code as Algorithm 1. It builds on top of Ezudheen et al.’s algorithm and proceeds in three steps: first, it translates a game sample into an “equivalent” Horn sample ; then, it applies Ezudheen et al.’s learning algorithm to obtain a decision tree that is consistent with ; finally, it translates into a decision tree , which is consistent with . More precisely, the Horn sample constructed in Step 1 has the property that for every decision tree the set is consistent with if and only if its complement is consistent with the game sample . Since the decision tree obtained in Step 3 of our algorithm satisfies , it is thus consistent with . This property is formalized next.
Lemma 1**.**
Let be a game sample and a finite set of predicates, both over the domain . Moreover, let , , and be as in Algorithm 1. Then, is consistent with if and only if is consistent with .
Proof of Lemma 1.
Since consistency is a conjunction of conditions for individual examples and Horn constraints, respectively, we show Lemma 1 for each element of a sample individually. Recall that .
- •
Let be a positive example and the corresponding Horn constraint generated in Step 1. Then,
[TABLE]
- •
Let be a negative example and the corresponding Horn constraint generated in Step 1. Then,
[TABLE]
- •
Let be an existential counterexample and the corresponding Horn constraint generated in Step 1. Then,
[TABLE]
- •
Let be a universal implication and the corresponding Horn constraints generated in Step 1. Then,
[TABLE]
In total, we obtain that is consistent with if and only if is consistent with . ∎
The correctness of Algorithm 1, stated in Theorem 3 below, is now a direct consequence of Lemma 1 and Theorem 2. Note that the term in the runtime estimation stems from the fact that each universal counterexample might have data points on its right-hand-side, resulting in Horn constraints.
Theorem 3**.**
Let be a game sample and a finite set of predicates, both over the domain . Moreover, let be the number of data points in . If a decision tree over exists that is consistent with , then Algorithm 1 learns one in time \mathcal{O}\bigl{(}n|\mathcal{P}|+n^{2}(|\mathit{Pos}|+|\mathit{Neg}|+|\mathit{Ex}|+n|\mathit{Un}|)\bigr{)}, assuming that predicates can be evaluated in constant time.
If the abstract domain is numeric, say for some , a simple extension of Ezudheen et al.’s algorithm, and hence Algorithm 1, can be used to learn decision trees even over infinite sets of predicates. This extension was originally proposed by Quinlan [19] and assumes the set of predicates to be where denotes the -th component of the data point . The key idea is in fact straightforward: to separate two data points (e.g., a positive and negative data point), it suffices to consider only such predicates for which the constant occurs as an actual value in or . Thus, Algorithm 1 can always restrict itself to a finite subset of , which only depends on the values of the data points in the given sample.
In many situations, Algorithm 1 in fact guarantees that the overall feedback loop of Section III converges to a winning set in finite time if one can be expressed as a decision tree. For instance, if the set is finite, then there exist only finitely many semantically different decision trees (in terms of the set of data points they represent). Since Algorithm 1 always produces consistent decision trees, it will have exhausted all incorrect trees after a finite amount of time. If a winning set exists and can be expressed as a decision tree over , then the subsequent hypothesis will necessarily be one. On the other hand, if the underlying domain is numeric and Algorithm 1 operates over the infinite set , a technique proposed by Ezudheen et al. can be used to guarantee that a winning set will be learned in finite time (if one can be expressed as a decision tree over ). Due to the limited space, we have to refer the reader to Ezudheen et al. [4] for details and can here only state our main result.
Theorem 4**.**
Let be a safety game. If the learner of Section III uses Algorithm 1 over a finite set of predicates, then it is guaranteed to learn a winning set after a finite number of iterations if there exists one that is expressible as a decision tree over . The analogous statement holds for the set .
In practice, the overall runtime depends not only on the learner but also on the teacher (which might be adversarial). Hence, an average case analysis requires further assumptions on the teacher. We leave such an analysis for future work and turn to an experimental evaluation instead.
V Experimental Evaluation
To assess the performance of our learning framework and the decision tree learning algorithm, we have implemented a prototype named DT-Synth.111Code/benchmarks available at https://github.com/OliverMa1/DT-Synth. In this section, we describe DT-Synth in detail and compare it to three existing tools: CONSYNTH [6], which is based on constraint solving, as well as two tools based on learning finite automata [9], for brevity here called SAT-Synth and RPNI-Synth.
V-A Tools
DT-Synth takes games as input that are encoded as quantifier-free, first-order formulas in the theory of linear integer arithmetic (LIA). More precisely, variables of type integer encode vertices of a game graph, while formulas over those variables represent (infinite) sets of vertices. Additionally, the edge relation of a game graph is encoded by a formula where represents a source-vertex and a destination-vertex. We believe that this encoding of games is highly accessible and allows modeling many real-word synthesis tasks (e.g., robotic motion planning) in a natural way. Moreover, the decision trees computed by DT-Synth are generally easy for humans to comprehend, as we demonstrate in Section IV. Note, however, that the choice of LIA restricts our game graphs to countably many vertices. We have made this choice deliberately to be able to reuse Ezudheen et al.’s implementation of the decision tree learner [4]. DT-Synth implements the teacher and learner as follows:
Teacher
The teacher builds on top of the Z3 SMT solver [7]. Upon receiving a hypothesis in the form of a formula , it performs a series of satisfiability checks according to Definition 2 in order to search for counterexamples. If a satisfiability check succeeds, the teacher derives a counterexample from the model returned by Z3 (potentially triggering a finite number of additional satisfiability checks to compute the successors of a vertex in the case of existential and universal counterexamples).
Learner
The learner builds on top of code originally developed by Ezudheen et al. [4]. In addition to the set of predicates described in Section IV, it additionally uses octagonal constraints of the form (which capture distances between vertices in the Euclidean space). After a consistent decision tree has been learned, the learner converts it into a formula where is the set of all paths from the root to a leaf labeled with , and denotes that the predicate occurs on (negated if the path descends to the right). Then, it hands over to the teacher. Following the description in Section IV, it is not hard to verify that a vertex satisfies the formula obtained from the tree if and only if the vertex belongs to .
We compared DT-Synth to three other tools: CONSYNTH, SAT-Synth and RPNI-Synth. Table I summarizes the main similarities and differences of these tools.
CONSYNTH [6] is a tool for synthesizing winning strategies in games over infinite graphs with -regular winning conditions (which are more general than the safety games considered in this paper). The tool reduces the computation of a winning strategy into a satisfiablity problem for Constrained Horn Clauses (CHCs) [20], which allows it to leverage the power of modern SMT solvers. Similar to DT-Synth, CONSYNTH’s input is a game (with potentially unaccountably many vertices) encoded as an SMT formula. However, to remove existential quantifiers that arise from translating the synthesis problem into CHCs, CONSYNTH relies on so-called Skolem templates, which have to be specified by the user (we discuss the impact of Skolem templates later in this section). Intuitively, these templates constrain the search space of potential strategies and, hence, require insight into what a winning strategy might be (and the ability to express this insight as a Skolem template). By contrast, DT-Synth does not require additional help from the user (it synthesizes winning sets/strategies based on the game alone), but it can also not benefit from the user’s knowledge. The output of CONSYNTH is values for the parameters in the Skolem template (encoding a winning strategy) and a corresponding winning set in form of an SMT formula. Both objects can be challenging for humans to understand. Similar to DT-Synth, CONSYNTH guarantees to find a winning strategy if one can be expressed in terms of the Skolem template.
SAT-Synth and RPNI-Synth [9] are two techniques for synthesizing winning strategies of safety games over game graphs with countably many vertices. Both tools expect the game encoded by means of finite automata/transducers and leverage automata learning techniques to compute winning sets: SAT-Synth internally uses a SAT solver to learn automata, wheres RPNI-Synth is based on the popular RPNI learning algorithm [21]. Similar to DT-Synth and unlike CONSYNTH, both tools do not require additional information from the user. However, both the input and output of SAT-Synth and RPNI-Synth are finite automata, which is not a very natural encoding of games and can be difficult for humans to understand. Finally, SAT-Synth guarantees to find a winning set (if it can be expressed as an automaton), whereas RPNI-Synth does not.
V-B Benchmarks
We have evaluated the performance of all four tools on two benchmark suits, both featuring safety games over infinite game graphs.222Note that our approach is designed for infinite graphs but not to compete with highly-optimized synthesis engines on games over finite graphs. Hence, we did not consider benchmarks from the various synthesis competitions. The first benchmark suite accompanies SAT-Synth/RPNI-Synth [9] and consists of seven safety games, which are motivated by robotic motion planning. The second suite of benchmarks is shipped with CONSYNTH [6] and consists of five safety games: two versions of a combinatorial puzzle (Cinderella game), two program repair problems, and a synchronization problem for multi-threaded programs.
First Benchmark Suite
Our first benchmark suit comprises the following seven games, taken from Neider and Topcu [9]. Most games are from the area of robotic motion planning, and all are over infinite graphs.
Diagonal game:
A robot moves in an infinite, discrete two-dimensional grid world. Player [math] controls the robot’s vertical movement, while Player controls the horizontal. Player [math] wins if the robot stays within two cell around the diagonal.
Box game:
A variation of the diagonal game. Both players can move the robot in an vertical, horizontal or diagonal direction by one cell. Player [math] wins if the robot stays within a horizontal stripe of width three.
Limited Box game:
A variation of the box game. Player [math] can only control the robot’s vertical movement and Player the horizontal.
Solitary box game:
Another variation of the Box game in which only Player [math] is in control of the robot.
Evasion game:
Two robots are moving in an infinite, discrete two-dimensional grid world. The robots take turns moving at most one cell in any direction. Each players controls one robot. Player [math]’s objective is to avoid getting caught by Player ’s robot.
Follow game:
A version of the evasion game where Player [math]’s goal is to keep its robot within a Manhattan distance of two cells to the environment’s robot.
Square game:
A variation of the box game, where Player [math] wins if the robot stays within a fixed size square (here ).
Second Benchmark Suite
Our second benchmark suit comprises the following five games, taken from Beyene et al. [6]. Most games are from the area program synthesis and program repair, thus involve one player only.
Cinderella game:
The Cinderella game is a turn-based game, originally posed as a challenge for the synthesis community [22]. It involves the protagonist, the mythical Cinderella, and the antagonist, her stepmother. The game starts with five empty buckets, which are arranged in a circle and can hold up to a constant units of water. In every round of the game, the stepmother brings one unit of water, which she distributes arbitrarily among the five buckets. Then, Cinderella is allowed to empty two adjacent buckets. The stepmother wins if she can make one of the buckets overflow. Cinderella, on the other hand, wins if she can indefinitely prevent all buckets from overflowing. Beyene et al. considered two versions of the game for and , respectively, and have discretized the game such that the stepmother can distribute the water only in units.
Program repair game:
The program-repair game looks for a modification of statements such that the modified program satisfies its specification.
Repair-critical game:
The Repair-critical game is a game derived from concurrent program repair problems under fairness assumption.
Synthesis Synchronization game:
This game is an example for synthesis of synchronization in multi-threaded programs.
For the first benchmark suite, we have equipped CONSYNTH with moderately restrictive Skolem templates; templates for the second suite were provided by Beyene et al. The representations of winning sets/strategies for all four tools are expressive enough for all games in these benchmark suits.
V-C Results
Table II lists the experimental results for all four tools on the SAT-Synth/RPNI-Synth benchmark suite [9] (upper part) and the CONSYNTH benchmark suite [6] (lower part).333Note that we were not able to reproduce the exact results for CONSYNTH reported by Beyene et al. [6]. CONSYNTH has numerous options, which influence the performance of the benchmarks drastically. We have been in contact with the authors to obtain the exact options that were used in their experiments. Unfortunately, some of these options resulted in crashes and incorrect results on our machine. Hence, we resorted to the default options. In the case of DT-Synth, RPNI-Synth, and SAT-Synth, Table II also shows the number of iterations as well as the size of the result (measured in the number of inner nodes in a decision tree and the number of states of an automaton, respectively). We have conducted all experiments on an Intel Xeon E7-8857 v2 CPU with GB of RAM running a 64-bit Debian operating system. The timeout was .
We rank the performance of the tools based on the number of games they can solve. To break ties, we consider the aggregate runtime on games that the tools were able to solve (i.e., not accounting for time-outs). With this scoring scheme, DT-Synth performed best and solved ten out of twelve games with an aggregated runtime of . RPNI-Synth ranked second, having solved the same ten games but with an aggregated runtime of (i.e., times slower than DT-Synth). CONSYNTH ranked third and solved nine games with an aggregated runtime of . Finally, SAT-Synth performed worst, having solved seven games with an aggregate runtime of .
Compared to RPNI-Synth and SAT-Synth, DT-Synth required far fewer iterations and, hence, fewer interactions with the typically computationally expensive teacher. Moreover, the size of the final output is smaller, which makes it easier for humans to interpret. It is also important to emphasize that despite RPNI-Synth’s good performance, the tool does not guarantee to find a winning set. By contrast, DT-Synth provides such a guarantee.
Note that CONSYNTH was the only tool able to solve at least one version of the Cinderella game. We believe that this is due to the somewhat restrictive Skolem template that Beyene et al. have provided: if equipped with less restrictive templates, CONSYNTH times out as well.
In conclusion, DT-Synth is competitive to the state-of-the-art tools for solving safety games over infinite graphs. It does not require any user guidance, guarantees to find a winning set (if one can be expressed as a decision tree), and features easy-to-understand input/output formats.
V-D Impact of skolem templates
To assess the impact of Skolem templates on the performance of CONSYNTH, we have conducted a case study based on the Cinderella game. This case study consists of a series of Skolem templates for Cinderella’s strategy that successively permit more and more complex behavior. Our goal is to determine the point at which Skolem templates become too permissive and CONSYNTH is no longer able solve the game within a reasonable time frame.
In the following, we explain the Skolem templates provided by Beyene et al. [6], describe our less restrictive templates in detail, and finally discuss the outcome of our experiments with these templates. For the remainder of this subsection, we focus on the game with as it has a winning strategy for Cinderella that is easy to understand.
Skolem templates and the Cinderella game
Beyene et al.’s formulation of (safety) games in terms of Constrained Horn Clauses (CHCs) relies on existential quantifiers to express the effects of actions of Player [math]. In order to eliminate these quantifiers and make the resulting formulas amenable to constraint solving, CONSYNTH uses so-called Skolem templates. Intuitively, Skolem templates are user-provided formulas that capture high-level intuitions about potential winning strategies and, hence, restrict the space of strategies that CONSYNTH has to consider.
In the case of the Cinderella game with the Skolem template provided by Beyene et al. only allows quite restrictive strategies: the choice of buckets that Cinderella has emptied in the current round always dictates the choice of buckets that she will empty in the next round, completely ignoring which buckets the stepmother actually fills. In fact, it turns out that continuously emptying Buckets and , then Buckets and , and then Buckets and is a winning strategy.
Intuitively, Beyene et al.’s Skolem template can be seen as an encoding of a finite-state machine with five states , where the state prescribes that Cinderella empties Buckets and (e.g., the winning strategy above corresponds to the finite-state machine depicted in Figure 6(a)). In the Skolem template, the transitions of the finite-state machine are left undefined, and solving the game amounts to directing the transitions in a way that the resulting strategy is winning for Cinderella. However, without solving the game first, it is very hard to know beforehand that a winning strategy of this form actually exists.
Constructing less restrictive templates
Generating less restrictive Skolem templates is now straightforward: instead of five states, we successively increase the number of states, thus allowing for incrementally more complex behavior of Cinderella. For instance, the finite-state machine in Figure 6(b) prescribes Cinderella to empty Buckets and twice before proceeding to emptying Buckets and as well as Buckets and . Note, however, that the strategy obtained from Beyene et al.’s original Skolem template always remains possible (as long as the encoded finite-state machine contains at least the states , , and ).
Impact of less restrictive Skolem templates
We have run CONSYNTH on the Cinderella game with for a series of Skolem templates that encode finite-state machines of increasing size. Recall that CONSYNTH computed a winning strategy using Beyene et al.’s original template (consisting of five states) in . For a template encoding six states (one more state in which Cinderella can empty Buckets and ), CONSYNTH computed a winning strategy in However, already for a template encoding seven states (the six-state template above plus an additional state in which Cinderella can empty Buckets and ), CONSYNTH was unable to find a winning strategy within . In particular, note that giving Cinderella the option to empty Buckets and twice increases the search space for strategies unnecessarily because Cinderella never requires to empty those two adjacent buckets in order to win.
The Skolem templates used in CONSYNTH are powerful tools to facilitate the computation of winning strategies, even for difficult safety games. However, designing such templates usually requires (high-level) knowledge about the form of winning strategies and the ability to express this knowledge in terms of logical formulas. Moreover, our experiments show that using even slightly suboptimal templates (e.g., by allowing one or two too additional moves for Player [math]) can lead to an increase in computational time that is prohibitive in practice.
VI Conclusion
We have developed a machine learning framework for synthesizing reactive safety controllers whose interaction with their environment is modeled by games over infinite graphs. Moreover, we have designed a learning algorithm for decision trees that learns winning sets/strategies and is in many situations guaranteed to find a solution if one exists. Our experimental evaluation shows that our approach is highly competitive and promises applicability to a wide range of interesting practical problems, specifically due to its ease of use.
A promising direction for future work would be to apply our technique to distributed synthesis problems and other, more complex synthesis settings. Moreover, we plan to extend our learning-based framework to more general winning conditions, such as reachability and liveness.
Acknowledgment
This work was partially funded by the ERC Starting Grant AV-SMP (grant agreement no. 759969)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. B. Dwyer, G. S. Avrunin, and J. C. Corbett, “Patterns in property specifications for finite-state verification,” in Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999. ACM, 1999, pp. 411–420.
- 2[2] A. Solar-Lezama, “The sketching approach to program synthesis,” in Programming Languages and Systems, 7th Asian Symposium, APLAS 2009, Seoul, Korea, December 14-16, 2009. Proceedings , ser. Lecture Notes in Computer Science, vol. 5904. Springer, 2009, pp. 4–13.
- 3[3] T. Brázdil, K. Chatterjee, J. Kretínský, and V. Toman, “Strategy representation by decision trees in reactive synthesis,” in Tools and Algorithms for the Construction and Analysis of Systems - 24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Part I , ser. Lecture Notes in Computer Science, vol. 10805. Springer, 2018, pp. 385–407.
- 4[4] P. Ezudheen, D. Neider, D. D’Souza, P. Garg, and P. Madhusudan, “Horn-ice learning for synthesizing invariants and contracts,” PACMPL , vol. 2, no. OOPSLA, pp. 131:1–131:25, 2018.
- 5[5] O. Kupferman, N. Piterman, and M. Y. Vardi, “An automata-theoretic approach to infinite-state systems,” in Time for Verification, Essays in Memory of Amir Pnueli , vol. 6200. Springer, 2010, pp. 202–259.
- 6[6] T. A. Beyene, S. Chaudhuri, C. Popeea, and A. Rybalchenko, “A constraint-based approach to solving games on infinite graphs,” in The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014 . ACM, 2014, pp. 221–234.
- 7[7] L. M. de Moura and N. Bjørner, “Z 3: an efficient SMT solver,” in Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings , ser. Lecture Notes in Computer Science, vol. 4963. Springer, 2008, pp. 337–340.
- 8[8] D. Neider, “Reachability games on automatic graphs,” in Implementation and Application of Automata - 15th International Conference, CIAA 2010, Winnipeg, MB, Canada, August 12-15, 2010. Revised Selected Papers , ser. Lecture Notes in Computer Science, vol. 6482. Springer, 2010, pp. 222–230.
