The Potential of Restarts for ProbSAT

Jan-Hendrik Lorenz; Julian Nickerl

arXiv:1904.11757·cs.DS·May 11, 2020

The Potential of Restarts for ProbSAT

Jan-Hendrik Lorenz, Julian Nickerl

PDF

TL;DR

This paper investigates the effectiveness of restart strategies for probSAT on near phase transition 3-SAT instances, demonstrating potential speedups and proposing a machine learning approach to optimize restart times.

Contribution

It introduces a machine learning pipeline to determine optimal restart times for probSAT, significantly improving its performance over existing strategies.

Findings

01

Potential speedup factor of 1.39 from empirical data

02

Weibull distribution fits runtime data for over 93% of instances

03

ProbSAT outperforms Luby's restart strategy with the proposed approach

Abstract

This work analyses the potential of restarts for probSAT, a quite successful algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT instances that are close to the phase transition. We estimate an optimal restart time from empirical data, reaching a potential speedup factor of 1.39. Calculating restart times from fitted probability distributions reduces this factor to a maximum of 1.30. A spin-off result is that the Weibull distribution approximates the runtime distribution for over 93% of the used instances well. A machine learning pipeline is presented to compute a restart time for a fixed-cutoff strategy to exploit this potential. The main components of the pipeline are a random forest for determining the distribution type and a neural network for the distribution's parameters. ProbSAT performs statistically significantly better than Luby's restart strategy and…

Equations8

\hat{o pt} = mi n_{x \in X} (\frac{1 - p}{p} \cdot x + \frac{\sum _{{y \in X_{\leq x}}} y}{∣ X _{\leq x} ∣})

\hat{o pt} = mi n_{x \in X} (\frac{1 - p}{p} \cdot x + \frac{\sum _{{y \in X_{\leq x}}} y}{∣ X _{\leq x} ∣})

s_{Y_{k}}^{(i, j)} = \frac{E [ Y _{k}^{(i)} ]}{E [ Y _{k}^{(j)} ]} .

s_{Y_{k}}^{(i, j)} = \frac{E [ Y _{k}^{(i)} ]}{E [ Y _{k}^{(j)} ]} .

L(\hat{\sigma},\hat{\mu})=\Bigl{|}F(x\mid\hat{\sigma},\hat{\mu})-\textnormal{empirical}\Bigl{|}+\Bigl{|}\sigma-\hat{\sigma}\Bigl{|},

L(\hat{\sigma},\hat{\mu})=\Bigl{|}F(x\mid\hat{\sigma},\hat{\mu})-\textnormal{empirical}\Bigl{|}+\Bigl{|}\sigma-\hat{\sigma}\Bigl{|},

t_{i}=\left\{\begin{array}[]{ll}2^{k-1}&\mbox{if }i=2^{k}-1\\ t_{i-2^{k-1}+1}&\mbox{else}.\end{array}\right.

t_{i}=\left\{\begin{array}[]{ll}2^{k-1}&\mbox{if }i=2^{k}-1\\ t_{i-2^{k-1}+1}&\mbox{else}.\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Ulm University, Institute of Theoretical Computer Science, 89069 Ulm, Germany 11email: {jan-hendrik.lorenz,julian.nickerl}@uni-ulm.de

The Potential of Restarts for ProbSAT

Jan-Hendrik Lorenz

Julian Nickerl

Abstract

This work analyses the potential of restarts for probSAT, a quite successful algorithm for k-SAT [4, 5], by estimating its runtime distributions on random 3-SAT instances that are close to the phase transition. We estimate an optimal restart time from empirical data, reaching a potential speedup factor of 1.39. Calculating restart times from fitted probability distributions reduces this factor to a maximum of 1.30. A spin-off result is that the Weibull distribution approximates the runtime distribution for over 93% of the used instances well.

A machine learning pipeline is presented to compute a restart time for a fixed-cutoff strategy to exploit this potential. The main components of the pipeline are a random forest for determining the distribution type and a neural network for the distribution’s parameters. ProbSAT performs statistically significantly better than Luby’s restart strategy and the policy without restarts when using the presented approach. The structure is particularly advantageous on hard problems.

Keywords:

Satisfiability Heavy tail Machine learning ProbSAT Restart Runtime distribution

1 Introduction

Las Vegas algorithms often achieve the best performance for many hard problems. Hence, within these algorithms, many uncertain decisions have to be made. Usually, it cannot be guaranteed that these choices are the best or even good ones. Many bad decisions in a row may lead the algorithm into something like a local optimum that is hard to leave again. Once in such a state, it could be beneficial to restart the whole process and retry from the beginning, hoping that this time the algorithm makes better decisions.

Not all probabilistic algorithms can benefit from restarting, and even if they do in some instances, they might not in general. Additionally, since in many problems no precise information is available on how close the algorithm is to a solution, it is hard to say how long it should wait before restarting. It is known, that if restarts are beneficial for a specific instance, the runtime behavior of the algorithm on that instance can often be described with a probability distribution that possesses a so-called heavy-tail. A distribution is called heavy-tailed if its survival function decays slower than any exponential, i.e., the cumulative distribution function approaches 1 slowly. Gomes et al. [12] found that a particular kind of heavy-tails, so-called power-laws, describe the runtime behavior of combinatorial search procedures very well. They showed that the use of a restart strategy effectively eliminates the heavy-tailed behavior and greatly improves the runtime. Nowadays restarts are, for example, used in most state-of-the-art SAT solvers [7].

Two theoretically intriguing restart strategies are introduced by Luby et al. [19]: The fixed-cutoff approach and Luby’s universal strategy. The fixed-cutoff approach minimizes the expected runtime for some fixed restart time. However, finding this optimal restart time is a hard task and requires nearly complete knowledge of the algorithm’s behavior on the problem instance. Luby’s universal strategy does not require any domain knowledge. They even showed that there is no other approach without domain knowledge that performs better. Thus, while the fixed-cutoff approach is theoretically the best strategy, Luby’s universal strategy is usually preferred (for example [16]). Haim and Walsh [13] created a portfolio-solver using several restart strategies including the Luby and fixed-cutoff strategy. They apply machine-learning methods to decide which strategy is employed.

Lorenz [18] developed a method to calculate the optimal restart time for the fixed-cutoff approach if the runtime behavior is given by a probability distribution. Hoos and Stützle [15] use probability distributions to approximate empirically measured runtimes. This is called the (empirical) runtime distribution. Arbelaez et al. [3] studied the empirical runtime distribution of probSAT, a stochastic local search SAT-solver by Balint and Schöning [5], and found that lognormal distributions describe the runtime behavior well. A recent development in the field is trying to predict the runtime distributions of unseen instances with machine learning methods (see for example Arbelaez et al. [3], Eggensperger et al. [9]).

Our contribution: This work analyzes the capability of probability distributions to describe the runtime behavior of probSAT on random 3-SAT instances close to the phase transition. We approximate the runtime distributions of probSAT with a similar methodology as presented by Arbelaez et al. [3]. In our work, the generalized Pareto, the lognormal and the Weibull distribution are used to describe the runtime behavior. It is observed that especially the Weibull distribution is well suited to describe the runtime behavior of probSAT, describing significantly more empirical runtime distributions than the established lognormal distribution.

We estimate optimal restart times from the empirical data as a baseline to evaluate restart times from the fitted distributions. Applying this restart time to the observed data leads to an average speedup factor of $1.393$ . The aforementioned fitted distributions are used with Lorenz’s method to find theoretically optimal restart times. We found a potential speedup factor of up to $1.300$ by applying those restart times. To the best of our knowledge, the potential speedup by restarting at the optimal time according to the respective runtime distributions has not been systematically studied before.

In the second part of this work, a machine learning pipeline which predicts the runtime distribution of so far unseen instances is presented. It consists of a random forest predicting the distribution type, and a total of three neural networks which predict the parameters of the distributions. The parameters are used to calculate restart times. This approach differs from the procedure introduced by Haim and Walsh [13]. The strategies in their portfolio use restart times which are independent of the instance. We observe statistically significant speedups according to both the t-test and a modified Wilcoxon signed-rank test ([14, Chapter 7.12]). An average speedup factor of more than $1.21$ is observed in hard instances, without worsening the performance on easier ones. An overall speedup factor of $1.06$ is achieved.

2 Restarts and Runtime Distributions

This section introduces the concept of restarts and the considered runtime distributions.

2.1 Restarts

Throughout this work we use the so-called fixed cutoff strategy [19]: The algorithm is always restarted after the same number of steps, and the number of possible restarts is unbounded. It is known that there is an optimal fixed restart time $t^{*}$ which minimizes the expected runtime. If the restart time $t^{*}$ is used, then the fixed cutoff strategy is superior to all other restart approaches. However, finding the best possible restart time $t^{*}$ is a hard task.

2.2 Runtime Distributions

The runtime of a randomized algorithm can be interpreted as a random variable. The long-term behavior has been extensively studied in survival analysis. Even though runtimes are discrete values, it is common to model the performance of an algorithm as a continuous process to use the tools provided by survival analysis.

We follow this convention and choose to use the lognormal, the Weibull, and the generalized Pareto (GP) distribution to describe the runtime behavior of probSAT. All three types of distributions are often used in the fields of survival analysis and reliability engineering.

The lognormal, the Weibull and the GP distribution were already considered as suitable in previous research articles. Most notably Arbelaez et al. [3] observed that the runtime behavior of randomly generated 3-SAT instances with a clause to variable ratio of $4.2$ can be described by lognormal distributions. Other results favoring the lognormal distribution are given by Arbelaez et al. [2] and Truchet et al. [22]. The former paper found that the runtime distributions (RTDs) of other SAT-solvers, CCSAT and Sparrow, also follow lognormal distributions. The latter observed that the RTD of the CSP-solver Adaptive Search on magic-square-problem instances is well described by the lognormal distribution.

Algorithms that show a runtime behavior which resembles a Weibull distribution have also been observed before. Hoos and Stützle [15] investigated the performance of the GSAT algorithm for different noise parameters. They noted that the RTD is approximately a Weibull distribution if the noise parameter is less than the (empirically observed) optimal noise parameter. Frost et al. [10] argued that the RTDs of several CSP-solver can be modeled by Weibull distributions for satisfiable, binary instances.

Another often encountered phenomenon in the analysis of algorithms is the occurrence of power-laws. Power-law distributions are defined by their tail behavior: If the probability density function behaves like a polynomial, then the distribution follows a power-law. This property is usually expressed in terms of the Pareto distribution. Such distributions are mainly found in combinatorial search algorithms. Examples are graph coloring [17] and quasi-group completion [12]. In this work, the GP distribution is used in lieu of the Pareto distribution. It is a stronger type of distribution which includes the Pareto distribution and is suitable to fit the tail behavior due to the Pickands–Balkema–de Haan theorem [6].

Another often considered distribution is the exponential distribution. However, both the Weibull and the GP are generalizations of the exponential distribution.

We are interested in the calculation of optimal restart times. Lorenz [18] describes a method to find the optimal restart times for all three distribution types. However, he found that for the Weibull distribution and the GP distribution the optimal approach is either instantly restarting or not restarting at all depending on the parameters. In practice instantly restarting results in much worse performances. Therefore, the Weibull and the GP distribution are employed with a location parameter that shifts the whole distribution. This already resolves the paradox of instantly restarting, since the resulting (theoretically) optimal restart times are greater than the location parameter. The used RTDs are obtained from the empirical distribution with maximum likelihood estimations (fits).

Figure 1 shows the empirical RTD and a Weibull fit. Here, the Weibull distribution is the best fitting distribution, but there are some substantial differences between the observed and the fitted runtime behavior. The empirical RTD shows a steeper ascend in the beginning but also shows some irregularities at about $10^{8}$ flips. For the sake of demonstration, we picked an example where the fit does not describe the empirical data well. Most other instances are described better by their respective best fit.

A behavior as especially in Figure 1 can lead to predictions of the (theoretical) optimal restart time which are far from the actual optimal restart time. These results are discussed in more detail in the next section. An advantage of using fitted distributions is that the complete runtime behavior can be expressed in just two parameters for the lognormal distribution or three parameters for the Weibull and GP distribution with location parameter.

3 The Potential of Restarts in ProbSAT

Stochastic local search (SLS) algorithms commonly employ restart strategies when the number of local search steps exceeds a cutoff value. However, finding a choice for the cutoff value is a hard task and strategies which work well for one instance might be a bad choice for other instances. However, if the runtime behavior always follows a particular distribution type and only the parameters of the distribution vary, then several conclusions about useful restart strategies can be drawn. Thus, it can be useful to study the algorithm’s runtime behavior for a specific problem class and model the behavior with RTDs.

Here, we study an SLS SAT-solver called probSAT [5]. Information on the details of the probSAT algorithm can be found in Appendix 0.D. While probSAT is simple to implement it excels on random SAT formulas close to the phase transition. This can be seen in the parallel track of the SAT competition 2014 where a parallel version of probSAT [4] won. The probSAT algorithm is also used in the SAT-solver dimetheus [11] which is the best performing solver in the random track of the SAT competition in recent years.

The original version of probSAT does not restart, while the parallel version of probSAT uses one process with restart time $10^{7}$ flips and another process with restart time $10^{8}$ flips regardless of the size and the structure of the instance. The authors did not explain how this restart time was obtained. Thus, in this section, we systematically analyze the runtime behavior and show that there is still much optimizing potential in the probSAT algorithm by using a carefully crafted restart strategy. Furthermore, the analysis implies that the runtime behavior of probSAT can be well described by three distributions: The lognormal, the Weibull and the GP distribution. In fact, two distribution types suffice to calculate restart times which significantly improve the performance of probSAT.

Here, we study a version of probSAT which is known as the break-only-poly-version of probSAT111The original version can be found here: https://github.com/adrianopolus/probSAT. with $c_{b}=2.3$ which is suggested by Balint and Schöning [5] for random 3-SAT formulas. The used implementation is extended by the possibility to use Luby’s strategy and to use a timeout after a fixed number of flips.

3.1 Instance specification

For the experiments, random 3-SAT instances ranging from 1500 to 2500 variables are created using the generator kcnfgen [11] by Oliver Gableske. We decided to use random instances since they are the typical use case of SLS-solvers.

The number of clauses was chosen such that the clause to variable ratio is close to the phase transition at about 4.27, but mostly below this ratio, since only satisfiable formulas are of interest. We stayed slightly below the phase transition because the instances in this range tend to be “interesting” as they are satisfiable but often hard to solve. Formulas with lower ratios, while almost always satisfiable, are mostly trivial to solve. Above the phase transition, the formulas tend to be unsatisfiable. Most SLS solvers and especially probSAT are so-called incomplete solvers, i.e., if a formula is unsatisfiable, they do not terminate. Thus, the RTDs can only be studied on satisfiable instances. The smallest used ratio was 4.204, the largest 4.272.

The complete instance set consists of 2400 formulas, 1632 of which are satisfiable, ensured by the SAT-solver dimetheus [11]. Note that this can be seen as filtering since formulas on which dimetheus performs well are preferred. Only the 1632 formulas guaranteed to be satisfiable are used for the further steps. We decided to use dimetheus for this task since it is currently one of the best performing solvers on random instances.

3.2 Empirical Distributions

For an instance $i$ , the goal is to find a good approximation of its RTD $X_{i}$ on probSAT. For this, we sample the random variable $X_{i}$ 300 times to ensure stable results, measuring the number of flips until a satisfying assignment is found. This measure is chosen because it is stable and independent of hardware and scheduling. From this, we gain an empirical distribution function $\hat{F}_{300}(X_{i})$ . The solving is done with the help of Sputnik by Völkel et al. [23]. The maximum likelihood fits on the empirical distribution yield an estimation of the parameters for a lognormal, a Weibull and a GP distribution. The Kolmogorov-Smirnov test (KS-test) is applied to compare these new distributions with $X_{i}$ . Note, that passing the KS-test is not a proof of that distribution being the correct one. We only use it as a goodness-of-fit argument to see how well the actual distribution is described.

Only ten out of all 1632 instances could not be classified as any of the distributions according to the KS-test at a significance level of 0.05. The distribution with the highest p-value is designated as the “winning” distribution. Table 1 displays how often the distributions won or passed the KS-test.

It is noticeable that the Weibull distribution (W) describes about 93.7% of the observed RTDs well. This value increases to above 97% at a significance level of 0.01.

Furthermore, 78.0% of the instances can be described by a lognormal distribution (L). This supports the observations of Arbelaez et al. [2], who report that 389 of their 500 instances (77.8%) are described well by the lognormal distribution, with the same approach and at a significance level of 0.05.

On a side note, by applying the same procedure to the (shifted) exponential distribution, we observed that it covers 67.7% of the instances in the shifted and only 52.3% in the unshifted case.

3.3 Calculating the Potential

We approximate an optimal expected runtime using the following approach. Let $X$ be the set of observed runtimes of a fixed instance, and $X_{\leq x}$ the set of elements from $X$ that are smaller than $x$ . Then

[TABLE]

with $p=\frac{|X_{\leq x}|}{|X|}$ . This is the minimum expected runtime if restarted at an observed runtime given the measured data. A standard tool to compare performances is given by the speedup. Let $Y_{k}$ be any fixed instance and $Y_{k}^{(i)}$ be the runtime of algorithm $i$ on instance $Y_{k}$ . Then the speedup $s_{Y_{k}}^{(i,j)}$ is defined as:

[TABLE]

The geometric mean $GM=\left(\prod_{k=1}^{m}s_{Y_{k}}^{(i,j)}\right)^{\frac{1}{m}}$ is suitable to argue about the average speedup on $m$ instances with speedups $s_{Y_{1}}^{(i,j)},\dots,s_{Y_{m}}^{(i,j)}$ . With this, the average speedup of using the optimal restart time compared to not restarting is 1.393 which serves as a baseline for further comparisons. We interpret this value as the maximum speedup reachable with a fixed-cutoff restart strategy. This value in itself is already a strong result: If one knew the optimal restart times, one could improve the performance of probSAT by about 39%.

The speedup for the different distributions is calculated in a similar way. However, the considered restart time is now calculated from the parameters of the distribution as proposed by Lorenz in [18]. We calculate the speedup for each subset of the set of distribution types {lognormal (L), Weibull (W), GP}. Within each subset, we distinguish between two strategies. The first uses the distribution that performs best under the KS-test (highest p-value). The other chooses the distribution with the highest speedup. For both cases, this is decided for each instance individually. The results are displayed in Table 2.

Clearly a lot of potential is already lost just by the representation of the data as fitted probability distributions. Even in the best case, the speedup is lower by about 0.1 compared to the baseline. Additionally, we can see, that a single distribution type does not suffice to express the observed RTDs. Overall, the combination of Weibull and lognormal seems to be most stable, performing far better than any other combination in the KS-Test column, and being close to the best combination in the speedup column.

A reason for the observed discrepancy between the baseline and all other speedups could be that the fitted distributions are too smooth as can be seen in the figure discussed in section 2.2. Describing the runtime behavior with few parameters does not suffice to capture all necessary details. Furthermore, the used sampling method does not provide any specifics on very short runs and the probabilities associated with them. As a side note, restarting at $10^{7}$ or $10^{8}$ flips as proposed in [4] yields worse results than any of the distributions. For some instances the chosen restart time is too low by several magnitudes which deteriorates the performance.

4 From Theory to Practice

The previous sections argued about the speedup while estimating it from data obtained by running probSAT without any restarts. While this gives an idea of what is possible, it does not explain how we can use this information in practice. It is possible to model the estimation of the optimal restart time as a regression problem. Nevertheless, this approach is limited: Shylo et al. [21] found that optimal restart times are generally increasing with an increasing number of processors. Since the regressor is trained with data obtained from a fixed number of processors, its estimations are only accurate for the same number of processors. Estimating the RTDs is more flexible: The estimates can be easily used to find the optimal restart time on an arbitrary number of processors. Other scenarios can also benefit from the fits, like predicting the probability of completion within a deadline. We choose to predict the RTDs since it is a flexible model. However, obtaining the parameters of the fitted distribution is a non-trivial task. Eggensperger et al. [9] propose a machine learning pipeline for this task, which we adapt to fit our setting.

There are two steps in the estimation of the parameters: First, a random forest estimates the type of the distribution. Second, based on the type, a neural network trained specifically for that distribution type predicts the distribution’s parameters.

We experimented with several combinations of distribution types and found that the combination of lognormal and Weibull leads to the best results. While the previously observed potential speedup of the GP distribution is comparable to the other distributions, its network performed significantly worse. For this reason, the upcoming experiments omit the GP distribution.

We chose the final model of each component by comparing its performance on a test set. However, we did not measure its performance based on the respective loss function, but on the potential speedup, calculated in the same way as described in the previous section. The potential speedup is not suitable as a loss function for technical reasons. A lot of extra information is required to calculate the speedups, leading to very high runtimes in the learning process. The networks predict the distribution instead of the restart times since this work focuses on how well probSAT performs with restart times calculated from fitted RTDs.

The remainder of this section describes the applied features and details of the machine learning components. The quality of the components is measured on a test set. The test set consists of a sample of 162 instances where instances with exceptionally high or low shape parameters are oversampled on purpose. These were sampled from the 1632 instances mentioned in section 3.1. During training, only the remaining 1470 instances were used.

4.1 Feature Extraction

The SATzilla feature extractor by Xu et al. [25] creates the features of the instances (as motivated by Arbelaez et al. [3]). It is called with the parameters -base, -ls and -lobjois, leading to a total of 81 features. A description of the features is found in [24]. After normalizing the features to have minimum 0 and maximum 1, we selected the features with a variance larger than 0.05 and an additional four handpicked features. A total of 34 features remained. The used features are provided in the appendix.

4.2 Random Forest

The distribution type of an instance is estimated by a random forest. The label is the distribution that performed better under the KS-test (higher p-value). The random forest does not use normalized features. Normalization was tested as well, but no significant difference in the result was observed. The random forest was evaluated by ten iterations of a 10-fold cross-validation. The average potential speedup measured during the cross-validation is 1.218, further details of the evaluation of the random forest can be found in the appendix. We implemented the random forest with Python’s scikit-learn [20] library.

4.3 Neural Networks

Neural networks (NNs) predict the shape, scale and location parameter of the distributions, whereas the location parameter is only predicted for the Weibull distribution. Both distribution types use a separate NN. The experiments indicate that the inclusion of the location parameter into the network with the other parameters significantly worsens the overall performance. Therefore we trained a separate network to predict the location parameter. The networks are implemented in Python with the Keras framework [8] and tensorflow backend [1]. The specifications of the networks can be found in the appendix.

A standard loss function is the root-mean-square error (RMSE) of a prediction wrt. the ground truth. It was applied in the location network. Nevertheless, in the other NNs, two parameters have to be trained. It is possible to measure the sum of the two RMSE values for shape and scale, but the shape and the scale parameters are clearly related, thus the relationship between both parameters would be lost. Hence, it is advisable to use a loss function which captures the relationship between both parameters. Eggensperger et al. [9] use the negative log-likelihood function for measuring the quality of a fit. We also experimented with this function but achieved better results by using a different loss function based on the KS-Test. The loss function is presented in equation 3. Here, $\sigma$ is the shape and $\mu$ the scale parameter. The labels are the value of the maximum-likelihood estimation for $\sigma$ , one observed runtime $x$ and the associated probability of the empirical distribution:

[TABLE]

where $F$ is the corresponding cumulative distribution function. Here, $\hat{\sigma}$ and $\hat{\mu}$ are the predictions by the NN. This means that the network tends to predictions which minimize the absolute difference between the cumulative distribution function and the empirical distribution. It is also possible to define the loss function without the latter part $|\sigma-\hat{\sigma}|$ , but in our experiments, this term improved the predictions of the shape parameters significantly.

Lorenz [18] found that for all distributions the usefulness of restarts only depends on the shape parameters. Thus, it is especially important to predict the shape parameters of extreme cases well for the intended use case.

The performance of the NNs is checked and verified by ten iterations of a 10-fold cross-validation. The measured average potential speedups are 1.137 for the Weibull and 1.127 for the lognormal distribution. The average RMSE of the location network is 0.714.

A remarkable behavior during the evaluation of the Weibull network for shape and scale is linked to the location parameter. Estimating parameters that lead to a restart time that is below the location parameter drastically reduce the calculated speedup. A reason for this is that below the location parameter there are no observed runtime available. Thus, the density of the RTD is unknown. To be able to calculate any speedup, we approximate the behavior by an exponential function, so the density tends towards zero fast when reaching values lower than the location parameter. During the above evaluations, instances for which the network calculated a restart time below the location parameter are omitted. However, this occurred only five times out of 162 instances. Evaluating the whole pipeline on the test set lead to a remaining potential speedup of 1.157.

5 Experimental Results

We compare three different models: A fixed-cutoff strategy where the restart times are calculated by using the parameters from the NN. Note, that this method might lead to no restarts. In the following evaluation, this approach is denoted by “static restarts”. Secondly, Luby restarts are considered. The Luby strategy theoretically optimal and thus a good baseline. When using Luby restarts, the $i$ -th term $t_{i}$ can be calculated by the following formula.

[TABLE]

Usually, an initialization term $a$ is multiplied on $t_{i}$ to obtain the restart time $T_{i}$ . We use Luby restarts initialized with $20n$ , where $n$ is the number of variables, this restart time is suggested by Balint and Schöning [5]. Finally, not restarting the algorithm is considered.

Each of the approaches described above is tested on 100 new satisfiable instances generated with the same settings as defined in section 3.1. The runtimes for each instance are sampled 100 times, and the average runtimes of each strategy are compared. For this experiment, a timeout of $10^{11}$ flips was used for the static restart strategy. Only one instance was affected by the cutoff. For this instance the static restart strategy found a solution in 72 out of 100 runs, the Luby strategy found a solution in 47 cases and the not restarting approach in 28 cases. The following analysis only considers the remaining 99 instances. Again, we measure the runtime by the number of variable flips until a satisfying assignment is found.

The comparison of the logarithmically scaled runtimes between not restarting, static restarts and Luby restarts is illustrated in Figure 2. The static restart strategy clearly outperforms the Luby strategy on all but two instances. The comparison with not restarting is harder to interpret. It can be seen that for easy instances there is barely any difference in the performance of the strategies. However, for intermediate and hard instances the static restart strategy outperforms not restarting in most cases. The parameters of probSAT were tuned without restarting with a short timeout. Therefore, it is not surprising that easy instances do not profit from restarting. Hard instances, on the other hand, can potentially show a heavy-tailed behavior, i.e., probSAT’s parameters are not chosen optimally for the case without restarts.

The comparison between the static and the Luby strategy yields an average speedup of $2.725$ . The comparison between the static restart strategy and not restarting yields an overall average speedup of $1.063$ . This speedup includes instances where the static strategy predicted not restarting. If only instances are considered where the static strategy restarted, then an average speedup of $1.108$ is achieved on the remaining $69$ instances.

Figure 3 shows that the speedup scales well with the expected runtime: When considering the $33$ instances with the longest runtimes, a speedup of $1.216$ is obtained. Finally, the speedups are tested with the t-test and a modified Wilcoxon signed-rank test ([14, Chapter 7.12]). When all instances are considered, the static strategy performs statistically significantly better than both the Luby strategy (t-test $p<2\mathrm{e}{-50}$ , Wilcoxon $p<1\mathrm{e}{-50}$ ) and not restarting (t-test $p=0.0044$ , Wilcoxon $p=0.0174$ ). When only instances are considered for which restarts are applied, then the static strategy is still significantly better than Luby’s strategy (t-test $p<3\mathrm{e}{-35}$ , Wilcoxon $p<3\mathrm{e}{-13}$ ) and not restarting (t-test $p<6\mathrm{e}{-4}$ ,Wilcoxon $p<1\mathrm{e}{-3}$ ). We conclude that our system combines the advantages of not restarting on easy instances while also significantly improving the performance on hard problems.

6 Conclusion & Outlook

This work analyzes the potential speedup of restarts for the probSAT algorithm on random 3-SAT instances close to the phase transition. A major result is that if the best restart time is chosen according to the data, then an average speedup factor of $1.393$ will be obtained. We proceed to approximate the runtime behavior with continuous probability distributions. Three distribution types, the Weibull, the generalized Pareto and the lognormal distribution, are identified which describe the runtime behavior well. The (shifted) Weibull distribution performs best among these three as it describes the runtime distributions of $93.7\%$ instances well. The representations are used to calculate the (theoretically) best restart times; these predictions are used to measure the potential speedup of probSAT. An average speedup factor of up to $1.300$ can be achieved. We find it surprising how much potential probSAT has with respect to restarts since the used probSAT implementation is optimized towards utilizing the algorithm without any restarts regarding the expected runtime [5]. Modifications that lead to the runtime distribution having a heavy-tail more often could increase the expected runtime without restarts, but the potential speedup with restarts could be increased further. A handle for this modification in probSAT is, for example, the function that chooses the next variable to flip.

Since the generalized Pareto distribution worsened the results it is omitted in the following. The observations and the approximated runtime distributions are employed to train a random forest and neural networks. The random forest is used to distinguish between the distributions while the neural networks are used to predict the parameters of the distributions. The predictions are used to decide whether restarts are useful. If they are, then the optimal restart time is calculated, and the fixed-cutoff strategy is used. Otherwise, probSAT is not restarted. We compare this approach with the Luby strategy and not restarting. The observations show that our approach combines the good behavior of probSAT on easy and intermediate instances while considerably improving the performance on hard instances. The presented approach is statistically significantly better than both Luby’s strategy and not restarting. An average speedup of $1.063$ is obtained on all instances and an average speedup of $1.216$ on the $33$ of $99$ test instances with the longest runtime where restarts are applied.

While the presented procedure already demonstrates the high potential of the approach, there are still open ends for further research. A different set of features might improve the quality of the predictions. Features which describe runs of probSAT could be suitable candidates. Furthermore, the used restart strategy is very optimistic, assuming that the estimated restart time is, in fact, the optimal one. More sophisticated strategies that are aware of inaccuracies of the network could lead to further improvements, especially if the estimations are far off the actual values.

Naturally, our approach is not limited to probSAT. Any Las Vegas algorithm can be analyzed and optimized with the same technique. The results imply that well-performing algorithms can be further improved with our approach, especially on hard instances.

7 Acknowledgements

The authors acknowledge support by the state of Baden-Württemberg through bwHPC. Furthermore, we thank Gunnar Völkel for many insightful discussions in the field of statistics and support with Sputnik.

Appendix 0.A Features

All machine learning components use 34 features. The neural networks use them in a normalized form with minimum 0 and maximum 1. A description of the features can be found in [24].

The following are the features, grouped based on [24]. The number in brackets indicates the number of corresponding features.

•

Problem Size Features:

(2)

Number of variables and clauses in the original formula

(2)

Number of variables and clauses after simplification

•

Variable-Clause Graph Features:

(3)

Variable node degree statistics (mean, min, max)

(3)

Clause node degree statistics (mean, min, max)

•

Variable Graph Features:

(3)

Node degree statistics (mean, min, max)

•

Clause Graph Features:

(2)

Node degree statistics (mean, max)

•

Proximity to Horn Formula:

(2)

Number of occurences in a Horn clause for each variable (mean, min)

•

DPLL Probing Features:

(1)

Search space size estimate

•

Local Search Probing Features, based on 2 seconds of runnich each of SAPS and GSAT:

(10)

Number of steps to the best local minimum in a run

(2)

Average improvement to best in a run

(3)

Fraction of improvement due to first local minimum

•

Timing Features:

(1)

CPU time require for feature computation

The internal names of the features are:

nvarsOrig, nclausesOrig, nvars, nclauses, VCG-CLAUSE-mean, VCG-CLAUSE-min,

VCG-CLAUSE-max, VCG-VAR-mean, VCG-VAR-min, VCG-VAR-max, HORNY-VAR-mean,

HORNY-VAR-min, VG-mean, VG-min, VG-max, CG-mean, CG-max, CG-featuretime,

saps_BestSolution_Mean, saps_BestSolution_CoeffVariance,

saps_FirstLocalMinStep_Mean, saps_FirstLocalMinStep_CoeffVariance,

saps_FirstLocalMinStep_Median, saps_FirstLocalMinStep_Q.10,

saps_FirstLocalMinStep_Q.90, saps_BestAvgImprovement_Mean,

gsat_BestSolution_Mean, gsat_FirstLocalMinStep_Mean,

gsat_FirstLocalMinStep_CoeffVariance, gsat_FirstLocalMinStep_Median,

gsat_FirstLocalMinStep_Q.10, gsat_FirstLocalMinStep_Q.90,

gsat_BestAvgImprovement_Mean, lobjois-mean-depth-over-vars

Appendix 0.B Specifications and Validation of the Random Forest

The parameters were set to

n_estimators = 50, criterion = "entropy",

all other parameters were default.

Validating the random forest by ten iterations of a 10-fold cross validation resulted in a average potential speedup of 1.218. Evaluating the final model on an independent set of 162 instances resulted in a speedup of 1.138. The evaluation of the forests on other metrics is displayed in Table 3. All values concern the Weibull distribution as the label.

We were not able to get these values significantly higher. Especially an accuracy of below 75% is clearly not a desirable result. However, we still used the random forest, since the potential speedup was still promising. An explanation for this is that often both lognormal and Weibull describe the distribution well and therefore produce good restart times. Hence, classfying the wrong distribution does not necessarily have a negative effect on the potential speedup.

Appendix 0.C Specification of the Neural Networks

All networks used Adam as the optimizer, with initial learning rate 0.0005 and clipnorm 0.5. The inputs were processed in batches of 16.

The parameters of the networks are displayed in Table 4. Regarding the number of output neurons: While the relevant number of neurons was two, we had to use three in the lognormal and four in the Weibull network for technical reasons.

Appendix 0.D ProbSAT

ProbSAT [5] is a stochastic local search algorithm for solving the satisfiability problem. It starts with a randomly generated initial assignment. If the current assignment does not satisfy the formula, then ProbSAT randomly chooses an unsatisfied clause $C=(v_{1}\lor v_{2}\lor\dots)$ . Changing the value of the assignment for $v_{i}$ causes $C$ to be satisfied, this is called a flip (of $v_{i}$ ). ProbSAT flips $v_{i}$ with probability $\frac{f(v_{i})}{\sum_{k=1}^{l}f(v_{k})}$ , where $f(x)=\frac{\textnormal{make}(x)^{c_{m}}}{(1+\textnormal{break}(x))^{c_{b}}}$ . The make value is the number of clauses which become satisfied if $x$ is flipped, while the break value is the number of clauses which become unsatisfied if $x$ is flipped. After a variable is chosen and flipped the new assignment is checked. If the new assignment still does not satisfy the formula, then a new variable is chosen from another unsatisfied clause. This step is repeated until either a satisfying assignment is found or until the number of flips exceed a certain $max\_flips$ value. If the $max\_flips$ value is exceeded, then ProbSAT is restarted with a new random assignment.

Note, that the function $f$ contains two parameters $c_{b}$ and $c_{m}$ . Balint and Schöning [5] found that $c_{b}=2.3$ and $c_{m}=0$ are good choices for random $3-$ SAT formulas close to the phase transition. These parameter settings are also used in this work.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abadi, M., et al.: Tensor Flow: Large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/ , software available from tensorflow.org
2[2] Arbelaez, A., Truchet, C., Codognet, P.: Using sequential runtime distributions for the parallel speedup prediction of SAT local search. Theory and Practice of Logic Programming 13 (4-5), 625–639 (2013)
3[3] Arbelaez, A., Truchet, C., O’Sullivan, B.: Learning Sequential and Parallel Runtime Distributions for Randomized Algorithms. In: ICTAI 2016, San Jose, California, USA. pp. 655–662. IEEE (2016)
4[4] Balint, A., Schöning, U.: probsat and pprobsat. SAT Competition 2014 , 63 (2014)
5[5] Balint, A., Schöning, U.: Choosing Probability Distributions for Stochastic Local Search and the Role of Make versus Break. In: SAT 2012. pp. 16–29. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg (2012)
6[6] Balkema, A., De Haan, L.: Residual Life Time at Great Age. The Annals of probability pp. 792–804 (1974)
7[7] Biere, A., Heule, M., van Maaren, H.: Handbook of satisfiability. IOS press (2009)
8[8] Chollet, F., et al.: Keras. https://github.com/keras-team/keras (2015)