Revisiting Variable Ordering for Real Quantifier Elimination using   Machine Learning

John Hester; Briland Hitaj; Grant Passmore; Sam Owre; Natarajan; Shankar; Eric Yeh

arXiv:2302.14038·cs.FL·February 28, 2023

Revisiting Variable Ordering for Real Quantifier Elimination using Machine Learning

John Hester, Briland Hitaj, Grant Passmore, Sam Owre, Natarajan, Shankar, Eric Yeh

PDF

Open Access

TL;DR

This paper improves variable ordering for Cylindrical Algebraic Decomposition in formal verification by creating a more balanced dataset and evaluating machine learning models' generalizability.

Contribution

It introduces a new, less biased dataset for training ML models on variable ordering and assesses their performance and generalizability.

Findings

01

New dataset with over 41K challenges reduces bias.

02

Machine learning models show improved generalization on the new dataset.

03

Addressing dataset bias enhances the effectiveness of variable ordering strategies.

Abstract

Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information…

Tables2

Table 1. Table 1 : Performance of different models trained on Dataset 1 when evaluated on testing set 1 and entire Dataset 2.

Model Trained on

Dataset 1

Performance on

Training Set 1

Performance on

Testing Set 1

Performance on

Dataset 2 (all)

SVM

69.38%

58.88%

28.9%

k-NN

75.27%

57.36%

32.21%

DT

75%%

55.69%

31.44%

RF

76.39%

58.23%

34.15%

MLP

58.81%

53%

33.64%

Table 2. Table 2 : Performance of different models trained on Dataset 2 when evaluated on testing set 2 and entire Dataset 1.

Model Trained on

Dataset 2

Performance on

Training Set 2

Performance on

Testing Set 2

Performance on

Dataset 1 (all)

SVM

62.1%

57.39%

60.43%

k-NN

69.2%

54.9%

66.28%

DT

68.03%

55.04%

64.16%

RF

70.47%

55.07%

66.96%

MLP

50.51%

49.62%

48.19%

Equations4

φ = (i = 1 ⋀ k p_{i} (x_{1}, \dots, x_{n}) ⊙_{i} 0), p_{i} \in Q [x_{1}, \dots, x_{n}], ⊙_{i} \in {<, \leq, =, \geq, >} .

φ = (i = 1 ⋀ k p_{i} (x_{1}, \dots, x_{n}) ⊙_{i} 0), p_{i} \in Q [x_{1}, \dots, x_{n}], ⊙_{i} \in {<, \leq, =, \geq, >} .

\exists x_{1}, \dots, \exists x_{n} \in R . φ (x_{1}, \dots, x_{n})

\exists x_{1}, \dots, \exists x_{n} \in R . φ (x_{1}, \dots, x_{n})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Software Testing and Debugging Techniques · Adversarial Robustness in Machine Learning

MethodsTest

Full text

11institutetext: Imandra Inc., Austin, TX 78704, USA

11email: {john,grant}@imandra.ai

22institutetext: SRI International, Menlo Park, CA 94025, USA 22email: {briland.hitaj,natarajan.shankar,sam.owre,eric.yeh}@sri.com

Revisiting Variable Ordering for Real Quantifier Elimination using Machine Learning

John Hester Authors ordered alphabetically.

This research was, in part, developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government or DARPA.11

Briland Hitaj 22

Grant Passmore 11

Sam Owre 22

Natarajan Shankar 22

Eric Yeh 22

Abstract

Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information leakage, and test the generalizability of our models on the new dataset.

Keywords:

Variable Ordering Cylindrical Algebraic Decomposition Machine Learning Generalizability Bias.

1 Introduction

Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems such as aircraft collision avoidance systems, autonomous vehicles and medical robotics. While CAD is a complete decision procedure, it is computationally expensive with worst-case exponential complexity. However, the expense of running CAD on a given problem can be greatly affected by the way in which the problem is posed. In particular, selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning (ML) may be fruitfully applied to determining efficient variable orderings [4, 7]. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover [1, 10].

Generalizability is a major concern for ML-trained systems [6, 8]. For example, in visual object classification systems, if objects to classify in training sets always occur in the upper left of an image, the classifier may fail if it encounters objects in the bottom right.

The identification of symmetries is a major factor in creating robust ML systems. Symmetries are transformations of data that preserve relevant semantics. Image based object classification provides many natural examples, where transformations such as mirror flipping or large random crops do not change the identity of the object in the image. These symmetries form the basis of data augmentation methods, where such transformations applied over existing training data can create additional training data. This augmentation can substantially improve the robustness of trained systems [5].

We argue that this problem also exists in applications to mathematical tasks such as CAD variable ordering. In this case, variables may be permuted over both statements and target orderings while preserving semantics. Performing an augmentation that debiases using formula symmetries is important for maintaining performance in a statistically trained ML system.

1.0.1 Contributions:

•

We observe bias in the classical MetiTarski benchmarks often used in training and evaluating ML-based CAD variable ordering selection systems.

•

We show how data augmentation along inherent symmetries can remove bias and improve generalizability, resulting in a new “debiased” benchmark set.

•

We argue that this phenomenon is more general than CAD, and that debiasing using formula symmetries should be a standard tool for applications of ML in computer algebra, program verification, and other fields manipulating mathematical formulas.

2 The CAD Variable Ordering Problem

2.1 CAD-based Real Quantifier Elimination

Consider a system of nonlinear equations and inequalities over $\mathbb{R}$ :

[TABLE]

CAD may be used to decide the satisfiability of this system, i.e., the truth of

[TABLE]

by (a) partitioning $\mathbb{R}^{n}$ into finitely many connected regions called cells $C=\{c_{1},\mathellipsis,c_{m}\}\subseteq\mathbb{R}^{n}$ s.t. each $p_{i}$ is sign-invariant over each $c_{j}$ , and (b) evaluating $\varphi$ at a sample point $s_{j}\in c_{j}$ drawn from each cell. Given sign-invariance and the fact that $C$ is a covering ( $\bigcup C=\mathbb{R}$ ), it follows that $\varphi$ is satisfiable over $\mathbb{R}^{n}$ iff it evaluates to true on at least one such sample point. This approach can be readily extended to formulas involving quantifier alternation [3, 9].

2.2 Efficient Variable Ordering

CAD construction proceeds in stages, eliminating each variable $x_{i}$ in turn. The variable ordering is the order in which variables are eliminated. In the present work, we are concerned with determining an efficient variable ordering for a given problem. For example, consider a CAD induced by polynomials $P=\{68x_{1}^{2}-12x_{3}x_{2}+46x_{3}-126,-54x_{2}x_{1}+11x_{1}+92x_{2}-x_{3}x_{1}^{2}-42x_{3}x_{2}x_{1}+45x_{4}^{2}-35\}$ over $\mathbb{Q}[x_{1},x_{2},x_{3},x_{4}]$ . Depending on the variable ordering, the efficiency of CAD computation can vary substantially [2]. For example, with ordering $x_{4}\succ x_{3}\succ x_{2}\succ x_{1}$ a CAD is constructed in 5s with 3,373 cells, while with $x_{3}\succ x_{1}\succ x_{4}\succ x_{2}$ a CAD is constructed in 93s with 43,235 cells.

3 Experimental Setup

In this section, we provide details about the datasets that we have used in our experiments, and we introduce a new augmented dataset designed to remove bias. In addition to the datasets, we provide details on the features considered based on England et al. [4] and Huang et al. [7] respective works, followed by a discussion of our labeling strategy. We conclude the section with a discussion of the machine learning models considered together with their respective hyperparameter setup.

3.1 MetiTarski Datasets

3.1.1 Dataset 1 (Original):

The first dataset is predominantly gathered from MetiTarski, by logging MetiTarski’s RCF subproblem queries during its proof search [10, 4]. The dataset contains 6,895 polynomial systems, together with data on the performance of a CAD-based decision procedure on different orderings of their variables. Every problem in this data set has 3 variables ( $x_{1},x_{2},x_{3}$ ) and thus 6 possible variable orderings.

3.1.2 Dataset 2 (Augmented):

We noted that the original MetiTarski dataset was highly imbalanced. While class [math], corresponding to the ( $x_{1},x_{2},x_{3}$ ) variable order contained 580-records, class $5$ , corresponding to the ( $x_{3},x_{2},x_{1}$ ) variable order contained 2,657-records, nearly 4-times more, Figure 1(a). We note that this may lead ML models to be biased towards certain label/s, thus preventing the models to learn relevant information and hinder their generalizability to new, previously unseen data samples.

We recognize that variables in the formula and ordering can be swapped without changing the time and cost needed to perform the computation. For instance, swapping variables $x_{1}$ with $x_{2}$ in the formulas and in the ordering leads to a CAD with the same time and cell cost. While this may seem apparent to a human or a machine reasoner that already has the ability to recognize this symmetry, for current machine learning systems it needs to be made explicit in the training data. This procedure resulted in a new augmented composed of 41,370 polynomial systems, Figure 1(b).

3.2 Feature Engineering and Labeling

3.2.1 Feature Engineering:

We process each set of polynomials extracting features enlisted in [4, 7], including the number of polynomials, maximum total degree of polynomials, maximum degree of each $x_{i}$ , and proportion of each $x_{i}$ appearing in polynomials and monomials.

3.2.2 Data Labeling:

In addition to the feature set, we assign to each polynomial problem a label ranging from $0...5$ , where each label translates to one of the $6$ -possible variable orderings. At present, the label for each polynomial problem corresponds to the variable ordering that takes the least amount of time.

3.3 Models

To evaluate our approach, we used $5$ -ML classifiers: 1) Support Vector Machines (SVM), 2) k-Nearest Neighbours (k-NN), 3) Decision Trees (DT), 4) Random Forests (RF), and 5) Multi-Layer Perceptrons (MLP). In their work, England et al. [4] rely on SVMs, k-NNs, DTs, and MLPs. By following a similar approach, we ensure that our strategy is comparable to the state-of-the-art and thus can be used as a foundation for future adoption of more complex ML strategies, such as Transformers [12] or Graph Neural Networks (GNNs) [11]. We used the scikit-learn111https://scikit-learn.org/stable/-based implementations of the ML algorithms. Similar to the works of England et al. [4] and Huang et al. [7], we employed a grid-search strategy with 5-fold cross-validation to identify the right parameter setup for each of the models.

4 Evaluation

In this section, we proceed with the evaluation of the performance of the selected machine learning models (cf. Section 3.3) on identifying the preferred (best) variable order for a given input problem. We transform the problem of determining the best variable order into a multi-class classification problem. In these kind of problems the training set is composed of $(x,y)$ tuples of data, where $x$ is an input sample and $y$ is the corresponding label for that sample. The goal of the learning process translates into the task of finding a function $f$ , such that $f(x)=y$ .

In our setting, the input data corresponds to the series of 11-features (cf. Section 3.2) whereas $y$ is one of the 6-labels from $[0,\mathellipsis,5]$ belonging to a preferred variable order from $[(x_{1},x_{2},x_{3}),\mathellipsis,(x_{3},x_{2},x_{1})]$ . The features were scaled by subtracting the mean and then scaling to unit variance222https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html.

4.0.1 Training:

For each of the datasets, we used $80\%$ of the data for training and kept the remaining $20\%$ as part of the testing set. For Dataset 1 (original), this corresponded to $5,516$ samples for training and $1,379$ for testing, whereas for Dataset 2 (augmented), $33,095$ samples were used during training and the remaining $8,274$ samples for testing. Tables 1 and 2 provide accuracy of each model obtained on the respective training set.

4.0.2 Testing:

We test each of the trained models on 20% of the respective datasets, i.e., on $1,379$ samples for models trained on Dataset 1 and $8,274$ samples for models trained on Dataset 2.

The performance of models trained on Dataset 1 varied from 53% for the MLP model up to 58.88% for the SVM, Table 1, these results being in-line with state-of-the-art work by England et al. [4] and Huang et al. [7]. Likewise, the models trained on the augmented Dataset 2 exhibit similar performance with the MLP architecture performing poorly with 49.62% accuracy and SVM obtaining up to 57.39% accuracy on the testing set, Table 2.

4.0.3 Evaluation on respective datasets:

Indeed, the performance of the models on the respective testing sets is quite promising and substantially better than random choice ( $\approx 16.67\%$ ). However, we observe that the original version of the collected MetiTarski data (Dataset 1) is highly imbalanced (cf. Section 3.1). Prior research in domains such as computer vision and natural language processing has emphasized the negative impacts of an imbalanced dataset, demonstrating that the accuracy exhibited by models trained on imbalanced data could be deceiving and hinder the capability of the model to learn meaningful information from the data. In most cases the model will exhibit a bias towards the label (class) which it has seen the most in the training set.

As such, it is interesting to investigate the performance of models trained on Dataset 1 (the original MetiTarski dataset) and the newly produced Dataset 2, with the latter being a superset of Dataset 1. As can be noticed in Table 1, there is a significant drop in classification accuracy for all the models trained on Dataset 1, with more than 25% drop in some cases. Models trained on the “debiased” Dataset 2 retain a good performance when evaluated on Dataset 1. We believe this is due to the training data being balanced and the model potentially seeing some of these samples during training, thus increasing its decision confidence.

5 Discussion and Future Work

We have re-examined a classical dataset for ML-driven CAD variable ordering and observed issues of bias. To address this, we have applied symmetry-based data augmentation to create a debiased version of the dataset333https://github.com/coproof/variable-ordering-revisited and have shown this improves generalizability. We believe that this phenomenon is quite general, and that debiasing using formula symmetries should be a standard tool for applications of ML in computer algebra, program verification, and other fields manipulating mathematical formulas. We plan to extend this work into general-purpose tools for ML-based variable ordering and apply it to many problem domains (e.g., Gröbner bases, BDDs, SAT, etc.).

5.0.1 Acknowledgements

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112290064. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Government or DARPA.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Akbarpour, B., Paulson, L.C.: Meti Tarski: An automatic theorem prover for real-valued special functions. Journal of Automated Reasoning 44 , 175–205 (2010)
2[2] Chen, C., Zhu, Z., Chi, H.: Variable ordering selection for cylindrical algebraic decomposition with artificial neural networks. In: Bigatti, A.M., Carette, J., Davenport, J.H., Joswig, M., de Wolff, T. (eds.) Mathematical Software – ICMS 2020. pp. 281–291. Springer International Publishing, Cham (2020)
3[3] Collins, G.E.: Quantifier elimination for real closed fields by cylindrical algebraic decompostion. In: Brakhage, H. (ed.) Automata Theory and Formal Languages. pp. 134–183. Springer Berlin Heidelberg, Berlin, Heidelberg (1975)
4[4] England, M., Florescu, D.: Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition. In: International Conference on Intelligent Computer Mathematics. pp. 93–108. Springer (2019)
5[5] Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CN Ns are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019), https://openreview.net/forum?id=Bygh 9j 09KX
6[6] Geirhos, R., Temme, C.R., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. Advances in neural information processing systems 31 (2018)
7[7] Huang, Z., England, M., Wilson, D.J., Bridge, J., Davenport, J.H., Paulson, L.C.: Using machine learning to improve cylindrical algebraic decomposition. Mathematics in Computer Science 13 (4), 461–488 (2019)
8[8] Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. ar Xiv preprint ar Xiv:1710.05468 (2017)