Optimality Clue for Graph Coloring Problem

Alexandre Gondran (ENAC); Laurent Moalic (Universit\'e de Haute-Alsace; (UHA))

arXiv:1812.07734·cs.DM·December 20, 2018

Optimality Clue for Graph Coloring Problem

Alexandre Gondran (ENAC), Laurent Moalic (Universit\'e de Haute-Alsace, (UHA))

PDF

Open Access

TL;DR

This paper introduces a novel method called optimality clue that uses randomized heuristics to estimate the likelihood of a solution being optimal in the Graph Coloring Problem, validated on benchmark instances.

Contribution

It presents a new approach to verify solution optimality in GCP by estimating the number of colorings using randomized heuristics, enabling practical optimality proofs.

Findings

01

Effective in confirming optimality on benchmark instances

02

Provides a probabilistic upper bound for the number of colorings

03

Works with standard heuristics like HEAD for large graphs

Abstract

In this paper, we present a new approach which qualifies or not a solution found by a heuristic as a potential optimal solution. Our approach is based on the following observation: for a minimization problem, the number of admissible solutions decreases with the value of the objective function. For the Graph Coloring Problem (GCP), we confirm this observation and present a new way to prove optimality. This proof is based on the counting of the number of different k-colorings and the number of independent sets of a given graph G. Exact solutions counting problems are difficult problems (\#P-complete). However, we show that, using only randomized heuristics, it is possible to define an estimation of the upper bound of the number of k-colorings. This estimate has been calibrated on a large benchmark of graph instances for which the exact number of optimal k-colorings is known. Our…

Figures14

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1: Distribution of 2031 RCBII graph instances

$χ$ known								Total
$𝒩 > 10^{6}$		$𝒩 \leq 10^{6}$				$𝒩$ ?		#instances
862 (control dataset)		959 (reference dataset)				210 (test dataset)		2031
		$i (G) \leq 𝒩$		$i (G) > 𝒩$
		393		566
opt. clue	not opt. clue	opt. clue	not opt. clue	opt. clue	not opt. clue	opt. clue	not opt. clue
0	862	0	393	449	117	39	171

Table 2. Table 2: Results of optimality clue tests for graphs of DIMACS benchmark with p < t 𝑝 𝑡 p<t .

Instances	$\| V \|$	$d$	$χ (G)$	$k$	$i (G)$	$𝒩 (G, k)$	$t$	$p$	$U B (G, k, p, t)$	Opt. clue	time (s)	$\underline{χ} (G)$	time(s)[15]
DSJC125.5	125	0.5	17	17	537,508	?	1,000	767	141,503	True	161	17	274
DSJC125.9	125	0.9	44	44	1,249	?	1,000	998	$+ \infty$	False	28	44	7
DSJC250.9	250	0.9	72	72	6,555	?	1,000	889	423,733	False	1,963	72	11,094
flat1000_50_0	1,000	0.49	50	50	$> 10^{7}$	?	1,000	1	2	True	25,694	50	3,331
flat1000_60_0	1,000	0.49	60	60	$> 10^{7}$	?	1,000	1	2	True	44,315	60	29,996
le450_5a	450	0.06	5	5	$> 10^{7}$	32	1,000	32	69	True	60	5	<0.1[21]
le450_5b	450	0.06	5	5	$> 10^{7}$	1	1,000	1	2	True	138	5	<0.1[21]
le450_5c	450	0.1	5	5	$> 10^{7}$	1	1,000	1	2	True	28	5	<0.1[21]
le450_5d	450	0.1	5	5	$> 10^{7}$	8	1,000	8	16	True	20	5	<0.1[21]
le450_15c	450	0.17	15	15	$> 10^{7}$	?	1,000	919	554,866	True		15	<0.1[21]
le450_15d	450	0.17	15	15	$> 10^{7}$	?	1,000	579	26,041	True		15	<0.1[21]
myciel3	11	0.36	4	4	102	520	1,000	435	7,105	False	10	4	<0.1
queen5_5	25	0.53	5	5	461	2	1,000	2	4	True	9	5	<0.1[21]
queen6_6	36	0.46	7	7	2,634	20	1,000	20	42	True	10	7	<0.1
queen7_7	49	0.4	7	7	16,869	4	1,000	4	8	True	10	7	<0.1[21]
queen8_8	64	0.36	9	9	118,968	$>$ 154,068	1,000	993	$+ \infty$	False	11	9	<1
r125.1c	125	0.97	46	46	787	?	1,000	977	934,514	False	5,962	46	<0.1[21]
DSJC250.5	250	0.5	?	28	24,791,612	?	1,000	999	$+ \infty$	False	1,696	26	18
DSJC500.5	500	0.5	?	47	$> 10^{7}$	?	341	281	32,731	True	out of time	43	439
				48		?	100,000	100,000	$+ \infty$	False
DSJC500.9	500	0.9	?	126	35,165	?	1,000	927	59,623	False	234,496	123	100
DSJC+300.1_8	300	0.1	?	8	$> 10^{7}$	?	1,000	3	6	True	22,896	5	<0.1[21]
DSJC+300.5_31	300	0.5	?	31	$> 10^{7}$	?	1,000	2	4	True	69,363	29	20
DSJC+400.5_39	400	0.5	?	39	$> 10^{7}$	?	1,000	96	252	True	386,037	36	135

Equations8

N (G, k + 1) \geq i (G) - k + 1,

N (G, k + 1) \geq i (G) - k + 1,

i_{ma x} (G) \approx i_{B} (G) = p = 1 \sum n (p n) (1 - d)^{(2 p)}

i_{ma x} (G) \approx i_{B} (G) = p = 1 \sum n (p n) (1 - d)^{(2 p)}

UB(G,k,p,t)=\left\{\begin{array}[]{ll}p+p^{\alpha\frac{t+p}{t}}&\text{if }p<t\times 0.99\\ +\infty&\text{otherwise}\end{array}\right.

UB(G,k,p,t)=\left\{\begin{array}[]{ll}p+p^{\alpha\frac{t+p}{t}}&\text{if }p<t\times 0.99\\ +\infty&\text{otherwise}\end{array}\right.

\langle f,X\rangle\left\{\begin{array}[]{rl}\displaystyle\text{Minimize}&f(x)\\[2.84526pt] \text{s.c.}&x\in X\end{array}\right.

\langle f,X\rangle\left\{\begin{array}[]{rl}\displaystyle\text{Minimize}&f(x)\\[2.84526pt] \text{s.c.}&x\in X\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicle Routing Optimization Methods · Scheduling and Timetabling Solutions

Full text

Optimality Clue for Graph Coloring Problem

Alexandre Gondran

ENAC

French Civil Aviation University

Toulouse

France

[email protected]

Laurent Moalic

UHA

University of Upper Alsace

Mulhouse

France

[email protected]

Abstract

In this paper, we present a new approach which qualifies or not a solution found by a heuristic as a potential optimal solution. Our approach is based on the following observation: for a minimization problem, the number of admissible solutions decreases with the value of the objective function. For the Graph Coloring Problem (GCP), we confirm this observation and present a new way to prove optimality. This proof is based on the counting of the number of different $k$ -colorings and the number of independent sets of a given graph $G$ .

Exact solutions counting problems are difficult problems (#P-complete). However, we show that, using only randomized heuristics, it is possible to define an estimation of the upper bound of the number of $k$ -colorings. This estimate has been calibrated on a large benchmark of graph instances for which the exact number of optimal $k$ -colorings is known.

Our approach, called optimality clue, build a sample of $k$ -colorings of a given graph by running many times one randomized heuristic on the same graph instance. We use the evolutionary algorithm HEAD [26], which is one of the most efficient heuristic for GCP.

Optimality clue matches with the standard definition of optimality on a wide number of instances of DIMACS and RBCII benchmarks where the optimality is known. Then, we show the clue of optimality for another set of graph instances.

keywords:

Optimality Metaheuristics Near-optimal.

1 Introduction

For a given integer $k\geq 1$ , a $k$ -coloring of a given graph $G=(V,E)$ is an assignment of one of $k$ distinct colors to each vertex $v\in V$ in the graph, so that no two adjacent vertices (linked by an edge $e\in E$ ) are given the same color. The Graph Coloring Problem (GCP) is to find, for a given graph $G$ , its chromatic number $\chi(G)$ corresponding to the smallest $k$ such that there exists a $k$ -coloring of $G$ . GCP is NP-hard [19] for $k\geq 3$ . The $k$ -coloring problem ( $k$ -CP) is the associated decision problem. For an optimization problem which is NP-hard, there is no efficient polynomial-time exact algorithm to solve it, unless P $=$ NP. Therefore for large size instances of a minimization NP-hard problem, the exact algorithms must be stopped before their end. In this case, exact algorithms such as branch and bound methods find a lower bound of the optimal value of the objective function. Heuristic approaches are then the only ways to find, in reasonably fast running-time, a “good” solution in terms of objective function value, i.e. an upper bound of the optimal value. However, even if an admissible solution is found, its distance to the optimal solution remains unknown, except for approximation algorithms111Notice that it is still NP-hard to approximate $\chi(G)$ within $n^{1-\epsilon}$ for any $\epsilon>0$ [35].. The optimality gap is the different between the upper bound (found by a heuristic) and the lower bound (found by a partial exact method). Optimality is proven only when this gap is equal to zero. Unfortunately for large size instances of an NP-hard problem, this gap is often important. It is particularly true for challenging instances [15, 26] of the GCP of the DIMACS benchmark [18]. This paper addresses the following question: What to do in this situation? Is it possible to prove optimality of a graph coloring problem instance using only heuristic algorithms?

The response is Yes, for specific class of graphs: for example, it exists efficient polynomial-time exact algorithms to find $\chi(G)$ for interval graphs, chordal graphs, cographs [27, 31]. For some graphs like 1-perfect graphs222A perfect graph is a graph in which the chromatic number of every induced subgraph equals the size of the largest clique of that subgraph. 1-perfect graphs are more general than perfect graphs. There exists polynomial-time exact algorithms to find $\chi(G)$ for perfect graphs [13], but slow in practice. Line graphs, chordal graphs, interval graphs or cographs are subclasses of perfect graphs., for which the chromatic number $\chi(G)$ is equal to the size of the maximum cliques $\gamma(G)$ , it is possible to solve the dual problem, the Maximum Clique Problem (MCP), with another heuristic and conclude to optimality if the size of the maximum clique found is equal to the smallest number of colors used for coloring $G$ found also by a heuristic. In this specific case, the optimality gap (or duality gap between GCP and MCP) is zero.

However, the response to the question is No, in general case; a heuristic finds approximate solutions (upper bound); although the coloring found may be optimal, it is not possible to prove this possible optimality. Therefore, the question become: what can be done better using only a heuristic than finding an approximate solution? Is it possible to define a kind of optimality index for a graph coloring problem instance?

One shows in this article that a heuristic does not only find an upper bound of $\chi(G)$ but that it is also able to count the number of different $k$ -colorings (i.e. the number of admissible solutions having the same objective function value). Our approach is based on the fact that the number of different $k$ -colorings decreases dramatically when the number of colors, $k$ , decreases too. Indeed figure 2 gives a typical example of a random graph with 30 vertices, a density of $0.9$ and $\chi(G)=16$ . The number of colorings with exactly $k$ colors (blue bars) and the total colorings with $k$ colors or less (red bars), noted $\mathcal{N}(G,k)$ , are exactly computed for all values from $k=16$ to $k=30$ . $\mathcal{N}(G,k)$ decreases exponentially when $k$ decreases to $\chi(G)$ . One proves a theorem showing that when the number of $k$ -colorings is lower than a given value (the number of independent sets of $G$ 333An independent set is a subset of vertices of $G$ , such that every two distinct vertices in the independent set are no adjacent.), then we achieve the optimum: $\chi(G)=k$ .

In this article, we try to apply the proposed theorem in order to prove optimality.

Brief solutions counting review

Our work tackles the problem of counting solutions of NP-complete problems which has been widely studied for boolean SATisfiability problem, called #SAT, or Constraint Satisfaction Problem (CSP), called #CSP; $k$ -coloring problem is a special case of CSP. These problems are known as #P-complete [33]. A recent survey on #CSPs is done in [17]. Even if a problem is not NP-hard, the problem of solutions counting is often hard. Specific studies on counting solutions of $k$ -CP are done in [16, 7, 25]. Because the exact counting is in many cases a complex problem, statistical or approximate counting are often considered. Then, uniform sampling of the set of solutions problem is related to the problem of counting solutions. Many works are done on uniform or near uniform sampling like [11, 12, 34]. The objective is to count by sampling. Frieze and Vigoda [8] give a survey on the use of Markov Chain Monte Carlo algorithms for approximately counting the number of $k$ -colorings. The features of ergodicity or quasi-ergodicity of the heuristics that guarantee an uniform sampling are deeply discussed in [6]. However, theoretical results are obtained with a high value of $k\geq\Delta$ where $\Delta$ is the maximum degree of the graph $G$ which is very far from $\chi(G)$ for challenging graphs. On the other hand, when tests are performed with $k=\chi(G)$ like in [7], the considered graph instances are often with more than $10^{20}$ $k$ -colorings. If the number of $k$ -colorings is too high (higher than the number of independent sets), then it is not possible to apply our theorem. Therefore, in practice, our approach can be applied to graphs that do not have too many optimal colorings; we considered graphs with at most 1 million different optimal colorings.

To our knowledge, it is the first time that solutions counting are used to prove optimality. We define a procedure, called optimality clue, in order to apply the proposed theorem. First, we build a sample of $k$ -colorings of a given graph $G$ by running many times (about 1,000 times) the same randomized heuristic algorithm. In this study, we use HEAD444Open-source code available at: github.com/graphcoloring/HEAD, our open-source memetic algorithm (i.e. hybridization of tabu search and evolutionary algorithm), which is very efficient heuristic solving GCP [26].

In this sample some colorings may appear several times and others only ones. The number of different $k$ -colorings inside the sample is used to build an estimation of the total number of colorings with $k$ colors. This estimator has been calibrated on a large benchmark of graph instances for which the number of optimal $k$ -colorings is exactly known. Because we have no guarantee that the sampling is uniform, in the general case, therefore we have no guarantee that our estimator is always exact.

Moreover, building a sample of $k$ -colorings is time-consuming, then the size of the sample should be “reasonable”. Therefore, graphs for which our optimality clue can be calculated are graphs having not too many optimal $k$ -colorings (i.e. about less than one million). Of course it is not possible to known a priori if a given graph has more or less than 1 million optimal colorings. Then, our approach provides a clue that a coloring found by the heuristic is perhaps optimal (positive conclusion) but never denies it (no negative conclusion): in many cases we can not have any conclusion.

This article is organized as follows. In Section 2, we present the new optimality proof for GCP based on solutions counting. Our general approach, called optimality clue, is define in Section 3. In Section 4, we detail how we calculate the estimate of the number of $k$ -colorings using benchmark graph instances. Numerical tests and experiments are presented in Section 5. Finally, we conclude in Section 6.

2 Proof of Optimality by solutions counting

Notice that there are different ways to count the $k$ -colorings of a given graph $G$ . When counting the number of different $k$ -colorings, we have to take into account the permutations of the color classes. We consider one $k$ -coloring not as an assignment of one color among $k$ to each vertex but as a partition of the vertices of the graph into $k$ independent sets. An Independent Set (IS) or stable set is a set of vertices of $G$ , no two of which are adjacent. Two $k$ -colorings $c_{1}$ and $c_{2}$ are considered identical if they correspond to the same partition of $G$ . The distance between two $k$ -colorings that is taken into account is the set-theoretic partition distance used in [10, 14, 26], which is independent of the permutation of the color classes. In previous works about solutions counting of $k$ -CP [7], authors counted the total number of $k$ -colorings including all the permutations like in the example of Figure 2; such a calculation of the number of different $k$ -colorings is $k!$ times higher than the way we count. This makes their methods inapplicable to our study. We write $\Omega(G,k)$ the set of all $k$ -colorings of the graph $G$ . A $k$ -coloring can use exactly $k$ colors or less, then $\Omega(G,k-1)\subset\Omega(G,k)$ . The cardinal of $\Omega(G,k)$ is noted $\mathcal{N}(G,k)=|\Omega(G,k)|$ .

Our approach is based on the following fact :

Lemma 1

Let a graph $G$ and an integer $k\geq 1$ . If there exists at least one $k$ -coloring of $G$ , then there exists at least $i(G)-k+1$ different $(k+1)$ -colorings of $G$ :

[TABLE]

where $i(G)$ is the number of independent sets of $G$ .

Proof 2.1.

Notice that a $k$ -coloring of a graph $G=(V,E)$ is a partition of $|V|$ vertices into $k$ IS. Indeed vertices colored with the same color inside a $k$ -coloring are necessarily an IS. In other words, it is always possible to color all vertices of any IS with the same color. We note $IS(G)=\{U\subset V\ |\ \forall x,y\in U^{2},\ \{x,y\}\notin E\}$ the set of all the IS of $G$ , then $i(G)=|IS(G)|$ .

Starting with one coloring of $G$ with exactly $k$ colors, for each independent set of $G$ except for the $k$ IS of the $k$ -coloring, it is possible to recolor all vertices of this independent set with a new color (the $(k+1)$ th color). We obtain by this way one different $(k+1)$ -coloring for each different independent set, then we count at least a total of $\mathit{i}(G)-k$ different colorings with exactly $(k+1)$ colors. Then, $\mathcal{N}(G,k+1)\geq\mathit{i}(G)-k+1$ because we have to count also the starting $k$ -coloring.

Then, we obtain the following theorem:

Theorem 1

Let a graph $G$ and an integer $k\geq 1$ . Let $\mathcal{N}(G,k)$ the number of $k$ -colorings of $G$ and $\mathit{i}(G)$ the number of independent sets of $G$ .

If $\mathit{i}(G)-k>\mathcal{N}(G,k)>0$ , then $\chi(G)=k$ .

Proof 2.2.

$\chi(G)\leq k$ * because $\mathcal{N}(G,k)>0$ . If $\chi(G)<k$ , it means that there exists at least one $(k-1)$ -coloring (i.e. $\mathcal{N}(G,k-1)>0$ ). If we add a new color, it is possible to consider this $(k-1)$ -coloring and to recolor any independent set of $G$ with the new color, we obtain by this way $\mathit{i}(G)-k$ different $k$ -colorings (by Lemma 1). Therefore $\mathit{i}(G)-k\leq\mathcal{N}(G,k)$ which refute initial assumption.*

For example, the studied graph in Figure 2 (30 vertices and density 0.9) has 38 different colorings with 16 colors: $\mathcal{N}(G,k=16)=38$ ; moreover this graph has 78 IS: $i(G)=78$ , then the theorem is applicable with $k=16$ because: $\mathit{i}(G)-k=78-16=62>38=\mathcal{N}(G,k)>0$ . Then, thanks to the theorem we can conclude that $\chi(G)=16$ . Moreover, for $k=17$ , $\mathcal{N}(G,k=17)=3121>i(G)-k=61$ , so the theorem is not applicable.

Corollary 1

Let a graph $G$ and an integer $k\geq 1$ . Let $\overline{\mathcal{N}}(G,k)$ an upper bound of the number of $\mathcal{N}(G,k)$ and $\underline{\mathit{i}}(G)$ a lower bound of $\mathit{i}(G)$ .

If $\mathcal{N}(G,k)>0$ and $\underline{\mathit{i}}(G)-k>\overline{\mathcal{N}}(G,k)$ , then $\chi(G)=k$ .

3 Optimality Clue

We propose in this paper to apply the corollary 1, so to find an appropriate upper bound of the number of $k$ -colorings of $G$ , $\overline{\mathcal{N}}(G,k)$ , and a lower bound of the number of independent sets of $G$ , $\underline{\mathit{i}}(G)$ .

3.1 IS counting

There exists many algorithms [4, 5, 28, 30] for counting all the maximal independent sets of a graph $G$ (or similarly counting all the maximal cliques555A maximal clique is a clique that cannot be extended by including one more adjacent vertex. A maximum clique is a clique that has the largest size in a given graph; a maximum clique is therefore always maximal, but the converse does not hold. Analogue definition for IS. in $\overline{G}$ , the complementary graph of $G$ ). By definition, the number of maximal IS, noted $i_{max}(G)$ , is a lower bound of $i(G)$ . Those algorithms are based on enumeration. Because we focus this study on graphs having less than 1 million optimal solutions, we can stop the enumerating after finding 1 million IS. Generally, $\mathit{i}(G)$ is very high except for graphs with very high density. Real-life graphs have often a low density, then $\mathit{i}(G)$ is very high. Moreover, a simple lower bound is given by [29] : $\mathit{i}(G)\geq 2^{\alpha(G)}+n-\alpha(G)$ , where $\alpha(G)$ is the size of the largest independent set of $G$ and $n$ the number of vertices. Bollobás’ book [2] (p.283) gives also a statistical number of maximal cliques of size $p$ for a random graph. Then, we conclude that:

[TABLE]

with $n$ the number of vertices and $d$ the density of a random graph $G$ .

In this study, we use Cliquer 666Code available at: users.aalto.fi/ pat/cliquer.html. To count all IS of a graph, you just execute: ./cl ¡complement graph¿ -a -m 1 -M ¡k¿, an exact branch-and-bound algorithm developed by Patric Östergård [28] that enumerates all cliques (an IS is a clique in the complementary graph).

It is more complex to evaluate $\overline{\mathcal{N}}(G,k)$ and section 4 presents a way to build an experimental upper bound of $\mathcal{N}(G,k)$ . We characterize this upper bound as experimental because it is based on experimental tests on benchmark graph instances, then there is no total guaranty that it is an upper bound.

3.2 Procedure

We define here the procedure of what we call Optimality Clue for graph coloring: let $G$ a graph and $k>0$ a positive integer, that we suspect to be the chromatic number of $G$ . The proposed approach is based on the five following steps:

Build a sample of $t=1,000$ $k$ -colorings of $G$ : we run the memetic algorithm HEAD on $G$ as many times as needed to obtain $t$ legal $k$ -colorings. Those solutions are the solutions sample. The size of the sample is equal to $t$ . We take in general case $t=1,000$ when it is possible. 2. 2.

Count the number of different $k$ -colorings inside the sample. This number is equal to $p$ . Of course $0\leq p\leq t$ . 3. 3.

Estimate an upper bound of $\mathcal{N}(G,k)$ as $UB(p,t)$ (cf. Section 4); this upper bound is function of $t$ and $p$ . 4. 4.

Compute $i(G)$ , the number of IS, or at least a lower bound if $i(G)>10^{6}$ , with an exact algorithm (Cliquer). 5. 5.

If $i(G)>UB(p,t)$ , then we conclude that solutions of the sample have a clue to be optimal:

Chances are that $k$ is equal to $\chi(G)$

3.2.1 Uniform sample

If the sample is uniform777All $k$ -colorings in the sample are uniformly drawn at random in $\Omega(G,k)$ ., then there exists statistical methods to count solutions and to build an upper bound with statistical guarantee, for example the capture-recapture methods: Peterson method [20], Jolly-Seber method [1] wich is commonly used in ecology to estimate an animal population’s size. However, it is not our case: we have no guarantee that our solutions sample is uniform or near uniform. HEAD is a memetic algorithm that explores the space of non-legal $k$ -colorings: a non-legal $k$ -coloring is a coloring with at most $k$ colors and where two adjacent vertices (linked by an edge) may have the same color (called conflicting edge). The objective of HEAD is to minimize the number of conflicting edges to zero, that is to get a legal $k$ -coloring. HEAD is an evolutionary algorithm with a population size equals to two. The two non-legal $k$ -colorings perform at each generation a tabu search and after a crossover. The sample distribution depends on the fitness landscape properties [24, 23]888The fitness landscape itself depends on the neighborhood used for tabu search and the crossover used. and there is no reason for this distribution to be uniform. A smooth landscape (respectively a rugged landscape) around a legal $k$ -coloring will increase (resp. decrease) the probability of finding this $k$ -coloring. Figure 4 represents the frequency of the 319 optimal $46$ -colorings of <r140_90.4> graph of RCBII benchmark (140 vertices and density 0.9) in a sample of size 100,000 found by HEAD heuristic. In this typical graph instance, the ratio between the least frequent and the most frequently found coloring is around a factor of $10^{3}$ which corresponds to the same scale as similar studies [34].

Another approach is to take into account the ergodicity of an algorithm, which is its capability to explore all the search space. More precisely, an algorithm is ergodic if it is possible (probability not null) to reach any $k$ -coloring from any other $k$ -coloring in a finite number of iterations. Random walks or Metropolis algorithms (with a positive temperature sufficiently high) are ergodic algorithms since there is always a finite probability of escaping from local minimum. However, those algorithms are very inefficient in practice to find an optimal $k$ -coloring in the general case.

3.2.2 Sample size

The choice of $t$ , the size of the sample, is very important for two reasons. First, in practice, to build a sample of $k$ -colorings can be very time-consuming, then the size of the sample should have a reasonable size. We take $t=1,000$ for most of the graph instances. However, the more challenging the graph instance, the longer HEAD takes to find one $k$ -coloring. Therefore, it is not possible to build a sample of size 1,000 for all graphs, such as for the <DSJC500.5> graph of DIMACS (cf. Table 5.1).

The second reason is more theoretical. We have limited the maximum number of different optimal solutions to 1 million, for a graph to be considered by our approach. In fact, we choose 1 million because it equals to $t^{2}$ with $t=1,000$ . Indeed, if the sample is uniformly drawn at random in $\Omega(G,k)$ , the probability $q$ that at least two colorings of the sample are identical is equal to999This problem is linked to the birthday problem that shows that in a room of just 23 people there’s a 50-50 chance that two people have the same birthday. In our case, the number of days in a year is $\mathcal{N}$ and the number of people is the size $t$ of the sample.: $q=1-\frac{\mathcal{N}!}{\mathcal{N}^{t}(\mathcal{N}-t)!}\simeq 1-e^{-\frac{t(t-1)}{2\mathcal{N}}}$ then $\mathcal{N}\simeq-\frac{t(t-1)}{2ln(1-q)}$ . We call also $q$ the collision probability. So, if $q=0.5$ then $\mathcal{N}\simeq 720626$ , if $q=0.393$ then $\mathcal{N}\sim t^{2}=10^{6}$ . Figure 4 represents the collision frequency, $q$ , in function of the sample size, $t$ , for different values of the $\Omega(G,k)$ size. When $\mathcal{N}(G,k)=10^{5}$ and $t=1,000$ , it is almost impossible to miss a collision in the sample, but for $\mathcal{N}=10^{6}$ , there is around 60% to miss a collision. However, it is not tragic to miss a collision for our approach. Indeed, the consequence is that the clue of optimality may be not applicable but the risk of false positive is avoided. A false positive occurs if our procedure 3.2 improperly indicates the optimality clue, when in reality the $k$ -colorings are not optimal. Moreover, the collision frequency is higher for a non-uniform sample than for a uniform one.

4 Estimate of the number of $k$ -colorings: $UB(G,k,p,t)$

4.1 Data sets

In order to define an estimator or at least an upper bound of the number of $k$ -colorings, we need to have a large number of graph instances for which we know the exact number of $k$ -colorings. Fabio Furini et al. [9] have published an open-source and very efficient version of the backtracking DSATUR algorithm [3] which returns the chromatic number of a given graph 101010Code available at: lamsade.dauphine.fr/coloring/doku.php. DSATUR is one of the best exact algorithms for GCP, particularly for graphs with high density. We suggest readers interrested in an overview of exact methods for GCP to read [22, 15].

We modified their DSATUR algorithm in order to count the total number of $k$ -colorings. The pseudo code of the algorithm, called CDSATUR, is presented in algorithm 1. CDSATUR returns, for all values $k$ , the exact value of $\mathcal{N}(G,k)$ taking into account the permutation of colors and especially $\mathcal{N}(G,k=\chi(G))$ .

Fabio Furini et al. published also 2031 random GCP instances called RCBII 111111Instances available in the same address with vertices from 60 to 140 and density between 0.1 and 0.9. This wide variety of graphs is our reference dataset. We complete this dataset with easy DIMACS graphs [18] for which $\chi(G)$ and $\mathcal{N}(G,\chi)$ is computable with CDSATUR.

The 2031 graphs of RCBII benchmark have characteristics described in Table 1. We can notice that $\chi(G)$ is known for all these graphs [9]. First we calculated $\mathcal{N}(G,\chi)$ with CDSATUR, with a time limit equals to $2400$ s. This time is enough for most of the graphs. There are only 210 graph instances of RBCII (on the 2031) for which CDSATUR does not have enough time to find $\mathcal{N}(G,\chi)$ . These 210 graphs are used to test our approach (test dataset).

Among the graphs for which $\mathcal{N}(G,\chi)$ can be determined, we consider only those with less than 1 million optimal solutions: they form the reference dataset (959 graph instances). Finally, we can distinct inside the reference dataset, graph instances verifying $i(G)>\mathcal{N}(G,\chi)$ (566/959) or not (393/959).

It remains 862 graphs on the 2031 of RBCII benchmark with more than 1 million of optimal solutions. We decided to test our approach on those graphs (called control dataset) to check if the proposed algorithm can produce false positives or not.

4.2 Analysis of graph instances

Before determining an upper bound of $\mathcal{N}(G,\chi)$ , we investigate the possible links between standard features of a graph as its size (number of vertices), its density, or its chromatic number and the number of optimal colorings: $\mathcal{N}(G,\chi)$

4.2.1 Links between $\mathcal{N}(G,\chi)$ , graph size and density

Graphs with same size (number of vertices) and same density can have a number of optimal colorings very different from one another. A typical example is given in Figure 6 where is represented the distribution of 49 graph instances with 80 vertices and density 0.3 (<r80_30.*> of RBCII benchmark) in function of the number of solutions $\mathcal{N}(G,\chi)$ . Half of the graphs (25/49) have less than 100 000 optimal solutions while a third (18/49) have more than 1 million optimal solutions. There are no simple law that characterize this distribution.

However, we can notice that the lower the density, the higher the optimal solution number. Indeed, Figure 6 presents the proportion of graphs with 70 vertices of RBCII benchmark having more than 1 million colorings depending on graph density. For a low density such as 0.1, nearly all graphs have more than 1 million optimal solutions, while no graph with high density (equals to 0.9).

In order to have a more fine view of the link between the number of optimal colorings and the graph density, we generated 1,000 random graphs with 50 vertices and density $d$ ( $d=0.1$ , $0.2,...,$ or $0.9$ ). Each line in Figure 8 represents (for each density) the proportion of graphs having less than $n$ optimal colorings with $n$ between $10^{2}$ and $10^{6}$ . Pink line of Figure 8 shows for example that 50% of graphs (with 50 vertices and density = 0.3) have less than $10^{5}$ optimal colorings. The plots are quite similar for graphs with 60 or 70 vertices. The graph size seams to have a slight influence on the number of optimal solutions.

4.2.2 Links between $\mathcal{N}(G,\chi)$ and $\chi(G)$

As shown in Figure 8, there is no obvious link between the chromatic number of a graph, $\chi(G)$ ( $y$ -axis) and the number of optimal colorings $\mathcal{N}(G,\chi)$ ( $x$ -axis). Each dot of Figure 8 corresponds to one graph of RCBII for which it is possible to calculate exactly $\mathcal{N}(G,\chi)$ with CDSATUR.

4.3 Upper bound function

We define in this Section an upper bound of $\mathcal{N}(G,k)$ based on the 953 graphs of the reference dataset. Suppose we have, for a given graph $G$ , a set of $n$ different $k$ -colorings: $\Omega(G,k)=\{x_{1},...,x_{n}\}$ , i.e. $n=|\Omega(G,k)|=\mathcal{N}(G,k)$ is unknown. We also have a sequence $W$ of $t$ independent samples: $W=(w_{1},...,w_{t})$ , where $w_{k}\in\Omega(G,k),\ \forall k=1...t$ . This sample $W$ is composed of $t$ independent success runs of HEAD algorithm. We note $\forall j=1...n,\ #(x_{j})$ the count of $x_{j}$ in $W$ . For these $t$ colorings, we count $p$ different colorings in $W$ : $p=|\{x_{j}\in W,\ #(x_{j})>0\}|$ . So then, $\mathcal{N}(G,k)\geq p$ and $t\geq p\geq 1$ . Figures 9 represent for each graph of the reference dataset, the number of different colorings $p$ found by HEAD on the total of $t=1,000$ success runs (in abscissa) and the exact number of colorings, $\mathcal{N}(G,k)$ , calculated with CDSATUR (in ordinate). Each dot corresponds to one graph of the reference dataset. The objective now is to determine an upper bound of $\mathcal{N}(G,k)$ , $UB$ , as small as possible. Indeed, in order to apply the Theorem 1, we must have $i(G)-k>UB$ .

Figure 9-right which is a zoom of the left figure for $p\leq 500$ shows that for $p\ll t$ , $p$ is near linear to $\mathcal{N}(G,k)$ : $p\sim\mathcal{N}(G,k)$ . Then, $p$ is a good candidate to be an estimator of $\mathcal{N}(G,k)$ . When $p$ is near to $t$ , the range of $\mathcal{N}(G,k)$ values is very large, near to $p^{2}=10^{6}$ , and $p$ is a very bad estimation of $\mathcal{N}(G,k)$ , but notice that $\mathcal{N}(G,k)<p^{2}$ . We add on those figures a red line that represents a possible upper bound of $\mathcal{N}(G,k)$ which is equal to:

[TABLE]

with $\alpha=1.01$ . Indeed, when $p\ll t$ , $UB(G,k,p,t)\sim 2p$ and when $p$ is near to $t$ , $UB(G,k,p,t)\sim p^{2}$ . Between these extreme values, the cloud of blue dots follows approximately an exponential curve. $UB(G,k,p,t)$ was also built to be above all blue dots; i.e. it is a valid upper bound for all graphs of the reference dataset. Of course, there is no guarantee that this upper bound is still valid for all other graphs. So, our approach is never able to prove optimality in a strict sense. It gives only a clue.

5 Experiments and analysis

5.1 Tests

The upper bound $UB$ was built based on the graphs of the reference dataset. Now, in order to test the optimality clue (procedure Section 3.2), we use this upper bound on graphs of the test dataset and the control dataset and for some graphs coming from the DIMACS benchmark.

Results on RCBII benchmark are presented in Table 1 in the two last lines. The first column concerns the 862 graphs with more than 1 million optimal solutions, corresponding to the control dataset. There is no false positive: the procedure 3.2 concludes for all the graphs that there is no optimality clue. The two following columns concern the reference dataset. More precisely, the second column concerns graphs having less than 1 million optimal solutions but that do not verify Theorem 1: the number of IS is lower than the number of optimal solutions. Of course, there are no false positives for this case, because $UB$ was built to validate those graphs (reference dataset). The third column concerns the 566 graphs verifying the Theorem 1. The optimality clue is proven for 449 of them because $i(G)>UB(G,k,p,t)>\mathcal{N}(G,k)$ . The optimality clue is not shown on the 117 ( $=566-449$ ) other graphs because $UB(G,k,p,t)\geq i(G)>\mathcal{N}(G,k)$ . $UB(G,k,p,t)$ is an upper bound too high in this case. To prove the optimality clue on those graphs, we would have to increase the size of the solutions sample, $t$ . The fourth column concerns the test dataset i.e. graphs for which the number of optimal solutions is unknown. We prove the optimality clue for nearly 20% of these graphs (39/210). There are three reasons why we did not prove the optimality clue for the other 171 (=210-39) graphs:

graph instances have more than 1 million solutions;
graph instances do not verify the Theorem 1; Nothing can be done for these two first reasons.
$p$ is too close from $t$ , then the upper bound $UB$ is too high. In order to have an upper bound more accurate, i.e. still valid but not too high, we have to increase the size of the sample or to choose another formula than equation (1). Our approach therefore applies to about 20% of the random graphs in the RCBII benchmark. For control and reference datasets, we get more or less the same proportion: 25% (449/1821).

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Baillargeon, S., Rivest, L.P.: Rcapture: Loglinear Models for Capture-Recapture in R. Journal of Statistical Software, Articles 19 (5), 1–31 (2007)
2[2] Bollobás, B.: Random Graphs. Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2 edn. (2001). 10.1017/CBO 9780511814068 · doi ↗
3[3] Brélaz, D.: New Methods to Color the Vertices of a Graph. Communications of the ACM 22 (4), 251–256 (1979)
4[4] Bron, C., Kerbosch, J.: Algorithm 457: Finding All Cliques of an Undirected Graph. Commun. ACM 16 (9), 575–577 (Sep 1973). 10.1145/362342.362367 · doi ↗
5[5] Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Operations Research Letters 9 (6), 375–382 (1990). 10.1016/0167-6377(90)90057-C · doi ↗
6[6] Ermon, S., Gomes, C.P., Selman, B.: Uniform Solution Sampling Using a Constraint Solver As an Oracle. In: de Freitas, N., Murphy, K.P. (eds.) Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012. pp. 255–264. AUAI Press (2012)
7[7] Favier, A., de Givry, S., Jégou, P.: Solution counting for CSP and SAT with large tree-width. Control Systems and Computers 2 , 4–13 (mar 2011)
8[8] Frieze, A., Vigoda, E.: A Survey on the use of Markov Chains to Randomly Sample Colourings, chap. 4. Oxford University Press, Oxford (2007). 10.1093/acprof:oso/9780198571278.003.0004 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Optimality Clue for Graph Coloring Problem

Abstract

keywords:

1 Introduction

Brief solutions counting review

2 Proof of Optimality by solutions counting

Lemma 1

Proof 2.1**.**

Theorem 1

Proof 2.2**.**

Corollary 1

3 Optimality Clue

3.1 IS counting

3.2 Procedure

3.2.1 Uniform sample

3.2.2 Sample size

4 Estimate of the number of kkk-colorings: UB(G,k,p,t)UB(G,k,p,t)UB(G,k,p,t)

4.1 Data sets

4.2 Analysis of graph instances

4.2.1 Links between N(G,χ)\mathcal{N}(G,\chi)N(G,χ), graph size and density

4.2.2 Links between N(G,χ)\mathcal{N}(G,\chi)N(G,χ) and χ(G)\chi(G)χ(G)

4.3 Upper bound function

5 Experiments and analysis

5.1 Tests

Proof 2.1.

Proof 2.2.

4 Estimate of the number of $k$ -colorings: $UB(G,k,p,t)$

4.2.1 Links between $\mathcal{N}(G,\chi)$ , graph size and density

4.2.2 Links between $\mathcal{N}(G,\chi)$ and $\chi(G)$