Learning to Sample Hard Instances for Graph Algorithms

Ryoma Sato; Makoto Yamada; Hisashi Kashima

arXiv:1902.09700·cs.LG·October 4, 2019

Learning to Sample Hard Instances for Graph Algorithms

Ryoma Sato, Makoto Yamada, Hisashi Kashima

PDF

Open Access 1 Repo

TL;DR

This paper introduces HiSampler, a machine learning-based probabilistic generator for hard graph instances, enabling diverse and significantly more challenging benchmarks for graph algorithms without relying on domain-specific rules.

Contribution

It presents the first machine learning approach to model and sample the distribution of hard instances for graph problems, surpassing rule-based methods.

Findings

01

Generated instances are several orders of magnitude harder than random ones.

02

HiSampler outperforms rule-based algorithms in the 3-coloring problem.

03

The method effectively models hard instance distributions without hand-engineered features.

Abstract

Hard instances, which require a long time for a specific algorithm to solve, help (1) analyze the algorithm for accelerating it and (2) build a good benchmark for evaluating the performance of algorithms. There exist several efforts for automatic generation of hard instances. For example, evolutionary algorithms have been utilized to generate hard instances. However, they generate only finite number of hard instances. The merit of such methods is limited because it is difficult to extract meaningful patterns from small number of instances. We seek for a probabilistic generator of hard instances. Once the generative distribution of hard instances is obtained, we can sample a variety of hard instances to build a benchmark, and we can extract meaningful patterns of hard instances from sampled instances. The existing methods for modeling the hard instance distribution rely on parameters or…

Tables3

Table 1. Table 1 : Qualitative comparison with other methods for generating hard instances.

	Random	Rule-based	Generic Algorithm	HiSampler
Effective		✓	✓	✓
Without hand-engineering	✓		✓	✓
Problem-agnostic	✓		✓	✓
Distribution	✓			✓
Sample-efficient		✓		✓

Table 2. Table 2 : Notations.

Notation	Description
$G$	A graph (i.e., an instance)
$V$	The whole set of nodes
$n$	The number of nodes (i.e., $n = \| V \|$ )
$𝑨 \in {0, 1}^{n (n - 1) / 2}$	The upper triangular part of an adjacency matrix
$𝑷 \in {[0, 1]}^{n (n - 1) / 2}$	The upper triangular part of the a probabilistic adjacency matrix
$L$	A graph algorithm
$hardness (𝑨, L) \in ℝ$	The hardness value of graph $A$ for algorithm $L$
$B$	The maximum number of evaluations
$l$	The number of layers of the neural network
$d_{i} (i = 0, 1, \dots, l)$	The dimensions of the hidden layers of the neural network

Table 3. Table 3 : The experimental results

Problem	3-coloring		Vertex Cover
Algorithm	DSATUR	MiniSat	B&B
n	50	200	50
$p^{*}$	0.1	0.025	0.1
HiSampler-vanilla	261331027.8	1120.4	8145.2
HiSampler-PER	610024238.8	2674.2	21376.4
Generic Algorithm	2464.8	660.8	8127.6
Erdős-Rényi $p = p^{*}$	407.0	693.6	3259.2
Erdős-Rényi $p = 0.1$	407.0	351.8	3259.2
Erdős-Rényi $p = 0.5$	2.0	282.4	2227.8
Erdős-Rényi $p = 0.9$	2.0	276.0	1160.8
Cheeseman et al. (1991)	597.8	810.2	N/A
Hogg and Williams (1994)	3883.8	815.8	N/A
Vlasie (1995)	240867.8	708.2	N/A
Mizuno and Nishihara (2008)	166294.6	875.4	N/A

Equations2

θ \leftarrow θ + α r \frac{\partial}{\partial θ} i = 1 \sum n (n - 1) /2 (lo g P_{i}^{A_{i}} + lo g (1 - P_{i})^{(1 - A_{i})}),

θ \leftarrow θ + α r \frac{\partial}{\partial θ} i = 1 \sum n (n - 1) /2 (lo g P_{i}^{A_{i}} + lo g (1 - P_{i})^{(1 - A_{i})}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joisino/HiSampler
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Constraint Satisfaction and Optimization

Full text

\jmlrvolume

101 \jmlryear2019 \jmlrworkshopACML 2019

Learning to Sample Hard Instances for Graph Algorithms

\NameRyoma Sato \[email protected]

\NameMakoto Yamada \[email protected]

\NameHisashi Kashima \[email protected]

\addrKyoto University

Kyoto 606-8501

Japan.

RIKEN Center for Advanced Intelligence Project

Tokyo 103-0027

Japan

Abstract

Hard instances, which require a long time for a specific algorithm to solve, help (1) analyze the algorithm for accelerating it and (2) build a good benchmark for evaluating the performance of algorithms. There exist several efforts for automatic generation of hard instances. For example, evolutionary algorithms have been utilized to generate hard instances. However, they generate only finite number of hard instances. The merit of such methods is limited because it is difficult to extract meaningful patterns from small number of instances. We seek for a probabilistic generator of hard instances. Once the generative distribution of hard instances is obtained, we can sample a variety of hard instances to build a benchmark, and we can extract meaningful patterns of hard instances from sampled instances. The existing methods for modeling the hard instance distribution rely on parameters or rules that are found by domain experts; however, they are specific to the problem. Hence, it is challenging to model the distribution for general cases. In this paper, we focus on graph problems. We propose HiSampler, the hard instance sampler, to model the hard instance distribution of graph algorithms. HiSampler makes it possible to obtain the distribution of hard instances without hand-engineered features. To the best of our knowledge, this is the first method to learn the distribution of hard instances using machine learning. Through experiments, we demonstrate that our proposed method can generate instances that are a few to several orders of magnitude harder than the random-based approach in many settings. In particular, our method outperforms rule-based algorithms in the 3-coloring problem.

keywords:

hard instances, graph problems, neural networks, reinforcement learning

††editors: Wee Sun Lee and Taiji Suzuki

1 Introduction

Given an algorithm for a combinatorial problem, how do we find instances that take a long time to be solved? We call such instances hard instances 111Hard instances are also referred to as instances that require a long time for any algorithm to solve it. In this paper, they are only referred to as instances that require a long time for a specific algorithm.. Finding hard instances is important in algorithm design for the following reasons:

•

Reason 1: Hard instances help analyze and accelerate the algorithm.

•

Reason 2: Hard instances help evaluate the performance of other algorithms.

The following illustrative example shows how hard instances help analyze an algorithm: Consider a sorting problem and quicksort, which uses the first element as a pivot. Suppose we do not know that the worst time complexity of quicksort is $\Theta(n^{2})$ . If we try running quicksort on some random sequences, quicksort seems to solve all instances in $O(n\log n)$ time. However, once a hard instance $[n,n-1,\dots,3,2,1]$ is found, we can see that the worst time complexity is $\Theta(n^{2})$ . Moreover, such an observation shows that choosing an elaborate pivot improves the algorithm (Reason 1). Besides, such an extreme case is useful for benchmarks because it reveals whether algorithms are robust to worst cases or are efficient only for average cases (Reason 2). Furthermore, in general, if we have hard instances for state-of-the-art algorithms in a benchmark problem instances, we can quickly check whether a new algorithm conquers weakness of the state-of-the-art algorithms (Reason 2).

Finding hard instances is helpful not only for academic subjects but also for practical and industrial subjects. For example, a task scheduler program solves the vertex coloring problem to optimize schedules. However, if a user inputs a malicious schedule (i.e., a hard instance), the scheduler takes a significant amount of time to solve the problem and may hang up. If the developers have such inputs beforehand, they can cope with the issue by setting the appropriate timeout period or maximum size of the input. Another example is the preparation of competitive programming contests. If we create hard instances for each problem, we can accurately check whether the submission is correct or not, which is useful for preparing competitions. Advantages of knowing hard instances are discussed in Cotta and Moscato (2003); van Hemert (2006); Smith-Miles et al. (2010) further.

There exist several efforts for automatic generation of hard instances. For example, Cotta and Moscato (2003); van Hemert (2006); Smith-Miles et al. (2010) generate hard instances using evolutionary algorithms. However, they generate only finite number of hard instances. The merit of such methods is limited because it is difficult to extract meaningful patterns from small number of instances. We seek for a probabilistic generator of hard instances. Once the generative distribution of hard instances is obtained, we can sample a variety of hard instances from the distribution to build a benchmark, and we can extract meaningful patterns of hard instances from sampled instances.

When we tackle this problem, we must specify the underlying set to model distribution. However, the form of instances depends on the problem. For example, instances are represented by an array in the sorting problem, and they are represented by a set of clauses in the SAT problem. In this paper, we focus on graph problems to fix the underlying set of distributions. Graph problems appear in many important problems. For example, the register allocation problem is formulated as the graph coloring problem (Chaitin et al., 1981), and the maximum clique problem can be utilized for community detection (Luce and Perry, 1949). It motivates us to focus on graph algorithms.

A straightforward method for modeling the hard graph distribution is to use a probabilistic graph model such as the Erdős-Rényi model (Erdős and Rényi, 1959). Though this is generic and simple, this is not efficient because worst time complexity is often far worse than average time complexity (Wilf, 1984). It indicates that, to model the hard graph distribution efficiently, we must develop a method that can capture the structure of the problem and generate rare instances.

In this paper, we propose HiSampler, the hard instance sampler, to obtain a generative distribution of hard instances of graph algorithms. It models the hard graph distribution using a neural network and trains the model via reinforcement learning.

Through experiments on seven algorithms of four typical graph problems, we demonstrate that HiSampler can generate instances that are a few to several orders of magnitude harder than the random-based approach in many settings, and that our method outperforms rule-based algorithms in the 3-coloring problem. The implementation of HiSampler is publicly available in https://github.com/joisino/HiSampler as an open source project.

The major contributions of this paper are as follows:

•

Novel formulation: We formulate the problem of modeling the hard instance distribution, which is practically important.

•

Novel method: We propose HiSampler, an effective method to model the hard instance distribution for a given graph algorithm.

•

Experimental evidence: We demonstrate the effectiveness of HiSampler through extensive experiments using seven algorithms of four problems.

Table 1 contrasts HiSampler against other methods for generating hard instances.

2 Proposed Method

We first describe the problem setting of this paper. Then, we propose HiSampler, an effective method to learn the distribution of hard instances for graph algorithms.

2.1 Problem Setting

We specify the task of learning the hard instance distribution of graph algorithms. In particular, we develop a method that models the hard instance distribution for algorithms for undirected, unweighted, and simple graphs. It is because they include many important problems. For example, the register allocation problem is formulated as the graph coloring problem (Chaitin et al., 1981) and the maximum clique problem can be utilized for community detection (Luce and Perry, 1949).

We aim to develop a method that relies on no problem specific properties; instead, we use only the hardness measures of the problem instances: $\textrm{hardness}({\boldsymbol{A}},L)$ , where $L$ is the given algorithm, ${\boldsymbol{A}}\in\{0,1\}^{n(n-1)/2}$ is the adjacency matrix, and $n$ is the number of vertices. The design of the hardness value is arbitrary if it can be obtained by actually running the algorithm on the instance. For example, in our experiment, the hardness is measured by the number of recursive calls DSATUR (Brélaz, 1979) makes to solve the instance and by the real time Nauty (McKay and Piperno, 2014) spends to solve the instance.

Formally, given an algorithm $L$ , we aim to develop a method that models a generative distribution $\mathcal{D}_{L}$ of graphs that maximizes $\mathbb{E}_{{\boldsymbol{A}}\sim\mathcal{D}_{L}}[\text{hardness}({\boldsymbol{A}},L)]$ . Besides, we develop a method that satisfies the following key assumptions.

Assumption1. Small instance: We fix the number of vertices $n$ because we can generate arbitrarily hard instances just by increasing the number of vertices, which is not practical. Moreover, since small instances can be visualized and are easy to interpret and analyze, it is important to generate hard instances without increasing the size of the instance.

Assumption2. Sample efficiency: Evaluating the hardness value is time consuming, especially when the instance is hard. Besides, algorithms that require special devices such as GPU and multiple cores cost much even if they run in a short period of time. Therefore, we cannot evaluate too many instances, which motivates us to find hard instances more efficiently. To overcome this problem, we set the budget $B$ of evaluation. In other words, we do not evaluate more than $B$ instances during training. It should be noted that evolutionary algorithms are not sample-efficient because they evaluate the fitness functions of large population in each iteration.

Table 2 summarizes the notations we use throughout the paper.

2.2 Hard Instance Sampler

We propose HiSampler to model the distribution of hard instances of graph algorithms. Figure 1 illustrates the overview of HiSampler.

Probabilistic Model: We consider distributions on the binary vector $\{0,1\}^{n(n-1)/2}$ because a graph $G$ is represented by an adjacency matrix ${\boldsymbol{A}}\in\{0,1\}^{n(n-1)/2}$ . HiSampler models the distribution by a fully-connected neural network $N$ with parameters ${\boldsymbol{\theta}}$ . Let $l$ denote the number of layers of $N$ and let $d_{i}$ be the dimensions of the hidden layers of $N$ . $d_{0}$ is the input dimension and $d_{l}=n(n-1)/2$ is the output dimension. The neural network $N$ takes a noise ${\boldsymbol{z}}\sim\mathcal{N}(0,I_{d_{0}})$ from the standard normal distribution as input and outputs a probabilistic adjacency matrix ${\boldsymbol{P}}\in[0,1]^{n(n-1)/2}$ . Then, an adjacency matrix ${\boldsymbol{A}}$ is sampled from $\text{Bernoulli}({\boldsymbol{P}})$ . Namely, $Pr[{\boldsymbol{A}}_{i}\mid{\boldsymbol{P}}]={\boldsymbol{P}}_{i}~{}(i=1,2,\dots n(n-1)/2)$ and each dimension of ${\boldsymbol{A}}$ is conditional independent given ${\boldsymbol{P}}$ . It is worth noting that ${\boldsymbol{A}}_{i}$ and ${\boldsymbol{A}}_{j}~{}(i\neq j)$ are not independent without any condition because ${\boldsymbol{P}}_{i}$ and ${\boldsymbol{P}}_{j}$ are not independent. Therefore, HiSampler can model nonlinear relationships between edges.

Optimization: HiSampler optimizes the parameters of the neural network $N$ using immediate-reinforcement learning. In this framework, the environment gives a noise ${\boldsymbol{z}}$ to the agent, the agent generates a graph ${\boldsymbol{A}}$ as an action, and the cost of solving the instance is fed back to the agent as a reward $r$ (i.e., $r=\textrm{hardness}({\boldsymbol{A}},L)$ ). The policy of the agent is modeled by the neural network $N$ . We optimize the parameters of the neural network $N$ using REINFORCE algorithm (Williams, 1992):

[TABLE]

where $\alpha$ is the learning rate. The procedure used to train the neural network model is shown in Algorithm 1.

Prioritized Experience Replay: The main challenge of learning the distribution of hard instances is that hard instances are sparse in the instance space. This tendency is significant especially for sophisticated algorithms. To alleviate this issue, we use a variant of prioritized experience replay (Schaul et al., 2016). Namely, we maintain an experience pool that contains top- $K$ hard instances. In each iteration, we sample an instance from the pool, and we train the model using the sampled experience. We refer to HiSampler with prioritized experience replay as HiSampler-PER to distinguish it from HiSampler-vanilla. The training procedure of HiSampler-PER is shown in Algorithm 2. We simplify the original prioritized experience replay by (1) using uniform distribution of top- $K$ instances instead of weighted sampling and (2) using reward to prioritize experiences instead of TD-Error. Though this is simple, we found this works well in practice for HiSampler. We empirically demonstrate effectiveness of this method in Section 5.

2.3 Complexity Analysis

We analyze the time complexity of HiSampler. The bottleneck step of HiSampler is forward and backward calculation of the neural network. It takes $O(\sum_{i=0}^{l-1}d_{i}d_{i+1})$ time per iteration. Therefore, the time complexity of HiSampler is $O(n^{2})$ with respect to the graph size because the dimension of the output layer is $d_{l}=n(n-1)/2$ . The additional computation needed for HiSampler-PER is maintaining the priority queue. It takes $O(\log K)$ time per iteration, which is negligibly small in practice.

3 Extensions

In our proposed method, the choice of the hardness function is arbitrary. Therefore, HiSampler can find not only instances that take a long time to be solved but also hard instances in terms of other criteria. We introduce two important examples.

3.1 Estimating Approximation Ratio

Let $L$ be an approximation algorithm, let ${\boldsymbol{A}}$ be an instance of the problem, let $L({\boldsymbol{A}})$ be the object value of the solution that $L$ outputs for ${\boldsymbol{A}}$ , and let $\text{OPT}({\boldsymbol{A}})$ be the optimal objective value of an optimal solution of ${\boldsymbol{A}}$ . The approximation ratio of $L$ is defined by $\displaystyle r(L)=\max_{\text{$ {\boldsymbol{A}} $is an instance}}\frac{L({\boldsymbol{A}})}{\text{OPT}({\boldsymbol{A}})}$ for a minimizing problem and by $\displaystyle r(L)=\max_{\text{$ {\boldsymbol{A}} $is an instance}}\frac{\text{OPT}({\boldsymbol{A}})}{L({\boldsymbol{A}})}$ for a maximizing problem. Estimating the approximation ratios is important for investigating the performance of the approximation algorithms. However, it is not trivial what instance maximizes the term. Here, we use $\frac{L({\boldsymbol{A}})}{\text{OPT}({\boldsymbol{A}})}$ or $\frac{\text{OPT}({\boldsymbol{A}})}{L({\boldsymbol{A}})}$ as the hardness value of the instance ${\boldsymbol{A}}$ ; then, we can search the maximizer by HiSampler. We will show an illustrative example with the well-known minimum vertex cover algorithm in Section 5.5.

3.2 Hard Instances for Enumerating Algorithms

Enumerating algorithms output all the elements that satisfy some property. The amortized time and maximum delay are sometimes investigated for evaluating the efficiency of enumerating algorithms. HiSampler can generate hard instances in terms of the amortized time and maximum delay by setting these measures as the hardness value.

4 Related Work

Constructing Hard Instances: There have been several researches on constructing hard instances of combinatorial problems. Hard instance generation was first studied in relation to the phase transition phenomena (Cheeseman et al., 1991; Hogg and Williams, 1994), which utilizes order parameters to generate hard instances. The three coloring instance generation by Mizuno and Nishihara (2008) and the graph isomorphism instance generation by Neuen et al. (2017) generate hard instances with rule-based algorithms. However, these works depend on problem specific knowledge, whereas our method is independent of the problem. Another approach is to generate instances by reducing other related problems. For example, the Latin square problem is found useful for constructing a benchmark for the graph-coloring problem (Gomes and Shmoys, 2002) and SAT (Achlioptas et al., 2000). However, the conversion of an instance of the Latin square problem into those of other problems also requires problem-specific knowledge. The methods that are most related to this paper are evolutionary algorithms (Cotta and Moscato, 2003; van Hemert, 2006; Smith-Miles et al., 2010). They optimize the hardness of instances using the evolutionary algorithm. However, they require designing gene representation for each task and cannot find a distribution but find a finite set of hard instances.

Deep Generative Graph Models: Recently, several generative graph models utilizing deep learning techniques have been proposed. The variational graph auto-encoder (Kipf and Welling, 2016) is one of the first models of this kind. It is a variant of the Variational Auto Encoder (VAE), which outputs a probabilistic adjacency matrix. This model was used for the link prediction of citation networks. Then, VAE (Simonovsky and Komodakis, 2018; Grover et al., 2019; Ma et al., 2018), Generative Adversarial Networks (GAN) (Wang et al., 2018; Bojchevski et al., 2018), and sequential generation (You et al., 2018b; Liu et al., 2018; You et al., 2018a) based generating models were proposed. In particular, they succeeded in generating various de-novo chemical materials and modeling real-world networks. ORGAN (Guimaraes et al., 2017) utilizes SeqGAN (Yu et al., 2017) and reinforcement learning to generate molecular graphs with the desired properties. It uses SMILES (Weininger, 1988) to represent a molecular graph because SeqGAN generates a sequence of symbols rather than a graph itself. MolGAN (Cao and Kipf, 2018) is another graph generative model utilizing GAN and reinforcement learning. It models the probabilistic adjacency matrix and attributes of graphs directly instead of using SMILES. The differences between HiSampler and deep generative graph models are (1) these models use training data that contain graphs with high objective values whereas HiSampler uses no training data and (2) many of these models are designed for generating molecular graphs, where the size of graphs is typically at most dozens, whereas HiSampler can generate graphs with more than a hundred nodes.

5 Experiments

We will answer the following questions through experiments:

Q1.

Scalability: How fast is HiSampler?

Q2.

Effectiveness: Does HiSampler generate harder instances than existing methods?

Q3.

Knowledge Extraction: Can HiSampler provide insights for algorithm design?

Q4.

Diversity: Is the distribution of HiSampler rich in diversity?

Q5.

Extensions: Can HiSampler estimate an approximation ratio?

Q6.

Effective Patterns: How can we extract effective patterns from the obtained hard graph distribution?

**Common Experimental Setup: ** We set the number of layers of HiSampler to three and the dimensions of the hidden layers to $d_{0}=10$ , $d_{1}=100$ , $d_{2}=500$ , and $d_{3}=n(n-1)/2$ throughout experiments. The activation functions in the hidden layers are ReLU, and the final output is processed by sigmoid activation. We use Adam (Kingma and Ba, 2015) with learning rate $0.0001$ to train the model. We set the pool size of HiSampler-PER to $K=10$ throughout experiments. We conduct experiments with Intel Xeon E5-2690 CPU. It should be noted that we can speed up the computation of HiSampler by GPUs, but we do not use GPUs for fair comparison.

5.1 Scalability

As we mentioned in Section 2.3, the complexity of HiSampler is $O(n^{2})$ . We investigate time consumption of training and sampling of HiSampler through experiments. We sample $100$ instances from each of $10$ HiSamplers, and we execute one step of training for each sample. We omit the time of evaluating the hardness value during training because the overhead of evaluation is common with other methods. Furthermore, we consider that training is already done when the evaluation time overwhelms model computation. If the evaluation takes much time in the initial evaluation, we should make the graph size smaller because generating small instances is important (Assumption 1 in Section 2.1).

Figure 3 reports the mean time of a single iteration of sampling and training. This shows that the computation does not grow much even if the number of nodes increases. In particular, one iteration of the training takes only four seconds even with $1024$ nodes. It indicates that HiSampler is highly efficient.

5.2 Effectiveness

We demonstrate how hard instances HiSampler can generate compared to existing methods. We use the 3-coloring problem, the minimum vertex cover problem, the maximum clique problem, and the graph isomorphism problem, and the following seven algorithms for these problems

DSATUR (3-coloring): This is a backtracking search method based on DSATUR (Brélaz, 1979). It assigns colors to the vertices one by one. At each step, it chooses one of the uncolored vertices that have the least number of candidate colors. If there are many such vertices, it chooses the vertex with the maximum degree. If the color assignment becomes inconsistent, it backtracks until it finds a consistent assignment. This method always outputs exact solution whereas the original DSATUR is not. We use the number of recursive calls as the hardness value.

MiniSat (3-coloring): This reduces the 3-coloring problem to the SAT problem, and this solves the reduced instance using MiniSat (Eén and Sörensson, 2003). We use the number of decisions MiniSat reports as the hardness value.

B&B (minimum vertex cover): This is a branch and bound algorithm that uses a greedy maximal matching as an upper bound. We use the number of recursive calls as the hardness value.

BK (maximum clique): This is a branch and bound algorithm based on the Bron-Kerbosch method (Bron and Kerbosch, 1973). This prunes the state when the union of the selected nodes and candidate nodes is smaller than the maximum clique found so far. Note that the original Bron-Kerbosch method enumerates all the maximal cliques whereas this algorithm only outputs a maximum clique. We use the number of recursive calls as the hardness value.

MCS (maximum clique): This is MCS (Tomita et al., 2010), a branch and bound algorithm. We use the time consumption as the hardness value ( $10^{-2}$ sec).

FMC (maximum clique): This is Fast Max-Clique Finder (Pattabiraman et al., 2013), a hierarchical-pruning based algorithm. We use the time consumption as the hardness value ( $10^{-6}$ sec)

Nauty (graph isomorphism): This is Nauty (McKay and Piperno, 2014), one of the state-of-the-art graph isomorphism solvers. We use the time consumption as the hardness value ( $10^{-6}$ sec)

To compare the effectiveness of HiSampler, we use the following baseline methods.

Generic algorithm: This searches hard instances using an evolutionary algorithm. We use the adjacency matrix $A$ as gene representation. We use the same hyperparameters as van Hemert (2006). Namely, the population size is $30$ , crossover is performed uniformly, mutation occurs with uniform probability with adapting mutation rate, and the fitness is the hardness value.

Random graphs: This samples $B$ graphs from the Erdős-Rényi model (Erdős and Rényi, 1959) and reports the hardest one.

Rule-based: We use several rule-based methods in the 3-coloring problem and the graph isomorphism problem. Cheeseman et al. (1991) and Hogg and Williams (1994) used the Erdős-Rényi model with carefully tuned parameters for the 3-coloring problem (i.e., $p=\frac{4.6}{n-1}$ and $p=\frac{3.4}{n-1}$ , respectively). Vlasie (1995) found that a regular structure plays a key role for hard instances and generated graphs with less 3-paths for the 3-coloring problem. Mizuno and Nishihara (2008) and Neuen and Schweitzer (2017) used characteristic gadgets to construct hard instances. We generate $B$ graphs using these methods and report the hardest one.

The most important step for HiSampler and the generic algorithm is initialization. It is known that the algorithm takes long time for graphs with certain range of edge density and that it takes short time for graphs with other density (Cheeseman et al., 1991). For example, 3-coloring algorithms can easily assert there is no solutions for dense graphs. If the initial distribution of HiSampler or initial population of the generic algorithm is far from the hard region, it takes much time for them to generate hard instances. To alleviate this problem, we determine the edge probability $p^{*}$ beforehand where each algorithm takes long time to process graphs with this density. We can use prior knowledge about the algorithm to determine $p^{*}$ . Alternatively, if we do not have such knowledge, we evaluate random instances of different edge probabilities (e.g., $p=0.01,0.02,\dots,0.99$ ), and we can use the hardest one as $p^{*}$ . It consumes negligibly small budget. We initialize the population of the generic algorithm by Erdős-Rényi model with $p=p^{*}$ , and we initialize the bias of the last layer of HiSampler as $b=-\log(\frac{1}{p^{*}}-1)$ so that $\sigma(b)=p^{*}$ , where $\sigma$ is the sigmoid function. We initialize the weight matrices of HiSampler with Xavier initializer (Glorot and Bengio, 2010) and biases of the lower layers with zeros as the default setting of the library. It should be noted that the other hyperparameters than the graph size $n$ and the initial edge probability $p^{*}$ are fixed throughout experiments.

We set the budget size as $B=10000$ , and we stop a method when it takes more than a day. We measure the hardness value of the hardest instance each method finds. We run 5 experiments for each method with different seeds and we report the mean of 5 runs. Table 3 summarizes the result of the experiments. We can see the following observations.

Observation 1. Prioritized experience replay is effective: HiSampler-PER consistently outperforms HiSampler-vanilla except for FMC, where HiSampler-vanilla slightly outperforms HiSampler-PER. It indicates that prioritized experience replay works well for HiSampler.

Observation 2. HiSampler-PER outperforms the generic algorithm: HiSampler-PER consistently outperforms the generic algorithm especially in DSATUR algorithm. It shows that HiSampler learns the hard distribution effectively.

Observation 3. HiSampler outperforms random-based methods: HiSampler consistently outperforms the Erdős-Rényi model with $p=p^{*}$ . It demonstrates that the distribution of HiSampler is not random. HiSampler learns effective structure of the hard graph distribution.

Observation 4. HiSampler outperforms rule-based methods: HiSampler consistently outperforms rule-based methods. It indicates that HiSampler can find highly effective structure for hard instances that could not be found manually.

It should be noted that we measured the hardest instance that each method found because the generic algorithm aims at searching a hard instance instead of modeling the hard instance distribution. We will investigate the properties of the distribution (e.g., diversity, mining patterns) in the later experiments.

5.3 Knowledge Extraction

We demonstrate how a hard instance provides helpful insight into making a better search algorithm using a concrete example. Figure 3 shows an example of the instances HiSampler generates. There are no solutions for this instance because it has a 4-clique $C$ (highlighted in orange). However, DSATUR cannot explicitly detect 4-cliques. It first assigns colors to $V\backslash C$ . Every time it finds a solution for $V\backslash C$ , the partial solution is immediately rejected when the algorithm starts to color the 4-clique $C$ . Then, the search is back-tracked and the algorithm starts to find other assignments of $V\backslash C$ . However, it does not obtain any result because any assignment will be rejected by the 4-clique $C$ . Finally, the algorithm finds all the valid assignments of $V\backslash C$ , and reports that there are no solutions for this instance. The key point is that the 4-clique $C$ is connected to $V\backslash C$ by a path $P$ (highlighted in blue). $P$ plays a role of a “bottleneck”. When the algorithm is coloring $V\backslash(C\cup P)$ , the number of color candidates of vertices of $P$ are at most two, and the degrees of them are only two. Therefore, DSATUR is reluctant to color these vertices. From this analysis, we can improve the backtracking search by preprocessing: deleting vertices whose degree is not more than two. Deleting such vertices does not change the answer because we can color the vertices whose degree is not more than two whatever the coloring assignment of the other vertices is: just color the vertex with the color that is not the same as the colors of adjacent nodes. This improvement helps avoid the problem described above. This discussion is a good example to show that analyzing a hard instance helps design an algorithm robust to hard instances (corresponds to Reason 1 in Section 1).

5.4 Diversity

We show that a variety of hard instances are sampled from the distribution HiSampler learns. Diversity of hard instances helps build a benchmark and extract meaningful pattern. We train HiSampler-vanilla for DSATUR algorithm. The hardness value of the hardest instance $x^{*}$ that HiSampler finds is $1637666819$ . We sample $1000$ instances from the distribution for which the Jaccard indices of edges between $x^{*}$ are less than $0.7$ . The mean of the hardness values of these instances is $3943028.974$ , which is still harder than the random models and the rule-based methods, and the mean of the Jaccard indices is $0.646$ . Moreover, the hardness value of the hardest instance among them is $819309215$ , keeping the Jaccard index $0.694$ . It shows that HiSampler retains diversity, whereas the genetic algorithm only generates a fixed number of instances.

5.5 Extensions

We show an illustrative example to estimate the approximation ratio using the greedy algorithm for the minimum vertex cover problem. It is known that the approximation ratio of the greedy algorithm is $2$ . We use the Erdős-Rényi model with $p=0.1$ and HiSampler-vanilla with $p^{*}=0.1$ . We set the number of vertices as $n=50$ . The other settings are common with previous experiments. We use $r({\boldsymbol{A}})=\exp(10\cdot L({\boldsymbol{A}})/\text{OPT}({\boldsymbol{A}}))$ as the hardness value, which is monotonically increasing for $L({\boldsymbol{A}})/\text{OPT}({\boldsymbol{A}})$ . We found that the slope of $L({\boldsymbol{A}})/\text{OPT}({\boldsymbol{A}})$ is too gentle to train the model, and used the objective function instead. We ran $5$ experiments with difference seeds. None of the uniformly random graph models found an instance ${\boldsymbol{A}}$ that satisfies $L({\boldsymbol{A}})/\text{OPT}({\boldsymbol{A}})=2$ . However, all five HiSampler models succeeded in finding an instance ${\boldsymbol{A}}$ that satisfies $L({\boldsymbol{A}})/\text{OPT}({\boldsymbol{A}})=2$ . It shows that HiSampler is useful for estimating the approximation ratios of approximation algorithms.

5.6 Effective Patterns

We demonstrate how to extract meaningful patterns from the hard graph distribution. Toward this end, we use frequent subgraph mining (Inokuchi et al., 2000; Kuramochi and Karypis, 2001). This discovers frequent patterns appeared in a database of graphs. We sampled $1000$ graph from the distribution that HiSampler learns for DSATUR. We utilize gSpan (Yan and Han, 2002) to extract frequent subgraphs of them. Figure 4 lists the frequent subgraphs that has at least $5$ edges and appears in more than $950$ sampled graphs. Figure 4 corresponds to the 4-clique highlighted in orange in Figure 3, and Figure 4 is a subgraph where a small unsolvable graph is connected to a path, which is effective structure as we analyzed in Section 5.3. It indicates that frequent subgraph mining tools are useful for extracting meaningful structure from the hard graph distribution.

6 Conclusion

This work tackled the problem of learning the distribution of hard instances using machine learning for the first time. We proposed HiSampler to model the hard instance distribution of graph algorithms. HiSampler is applicable to any graph algorithm without any prior knowledge. We demonstrated the effectiveness of HiSampler using seven algorithms for four graph problems. Furthermore, we showed that hard instances provided insight to analyze and accelerate the algorithm. We also showed that frequent subgraph mining extracts meaningful patterns from the hard graph distribution.

We discuss some future work of this work. Many existing works have tackled molecular generation using deep learning models. Comparing HiSampler with these methods and utilizing them for modeling the hard graph distribution is important future work. Besides, we model the distribution of graphs using adjacency matrices. We do not take isomorphism into account because some algorithms utilize node indices for tie-breaking. However, this limits the effectiveness of HiSampler for algorithms that utilize isomorphism by, for example, restarting with randomization. Modeling symmetry of graphs for such algorithms is promising future work.

7 Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 15H01704. MY is supported by the JST PRESTO program JPMJPR165A. We thank Yasuaki Kobayashi and Alessio Conte for discussing about the extensions of our proposed method.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Achlioptas et al. [2000] Dimitris Achlioptas, Carla P. Gomes, Henry A. Kautz, and Bart Selman. Generating satisfiable problem instances. In AAAI , pages 256–261, 2000.
2Bojchevski et al. [2018] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. Net GAN: Generating graphs via random walks. In ICML , pages 609–618, 2018.
3Brélaz [1979] Daniel Brélaz. New methods to color vertices of a graph. Commun. ACM , 22(4):251–256, 1979.
4Bron and Kerbosch [1973] Coenraad Bron and Joep Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Commun. ACM , 16(9):575–576, 1973.
5Cao and Kipf [2018] Nicola De Cao and Thomas N. Kipf. Mol GAN: An implicit generative model for small molecular graphs. Co RR , abs/1805.11973, 2018. URL http://arxiv.org/abs/1805.11973 .
6Chaitin et al. [1981] Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. Register allocation via coloring. Comput. Lang. , 6(1):47–57, 1981.
7Cheeseman et al. [1991] Peter C. Cheeseman, Bob Kanefsky, and William M. Taylor. Where the really hard problems are. In IJCAI , pages 331–340, 1991.
8Cotta and Moscato [2003] Carlos Cotta and Pablo Moscato. A mixed evolutionary-statistical analysis of an algorithm’s complexity. Appl. Math. Lett. , 16(1):41–47, 2003.