Procedural Generation of Initial States of Sokoban

D\^amaris S. Bento; Andr\'e G. Pereira; Levi H. S. Lelis

arXiv:1907.02548·cs.AI·July 8, 2019

Procedural Generation of Initial States of Sokoban

D\^amaris S. Bento, Andr\'e G. Pereira, Levi H. S. Lelis

PDF

TL;DR

This paper introduces a system called Beta that uses novel hardness metrics and exploration techniques to generate challenging, solvable Sokoban initial states, surpassing human-designed puzzles in difficulty.

Contribution

The paper presents new hardness metrics based on pattern database heuristics and a novelty-driven search method for generating difficult Sokoban initial states.

Findings

01

Beta generates initial states harder than human-designed puzzles.

02

Hardness metrics effectively guide the search process.

03

Generated states are solvable and suitable for evaluating planning systems.

Abstract

Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.

Tables1

Table 1. Table 1 : Results of states generated by variants of β 𝛽 \beta . The “ h ℎ h -value of s 0 subscript 𝑠 0 s_{0} ” denotes the average heuristic value of the generated initial states; “Number of Conflicts” shows the average number of conflicts, where 2 2 2 C, 3 3 3 C, and 4 4 4 C denote the number of conflicts of order 2 2 2 , 3 3 3 , and 4 4 4 , respectively. “# Solved” denotes the average and standard deviation of the number of problems solved by PRB.

	#	Generation Method	States Expanded	$h$ -value of $s_{0}$				Number of Conflicts			# Solved
	#	Generation Method	States Expanded	$h^{PDB1}$	$h^{PDB2}$	$h^{PDB3}$	$h^{PDB4}$	2C	3C	4C	# Solved
A	1	$[h^{PDB1}]$	36,844,914.09	179.68	185.94	187.88	190.49	6.27	1.93	2.61	33.0 $\pm$ 0.0
	2	$[h^{PDB2}]$	35,746,724.37	178.70	190.33	190.45	193.35	11.63	0.11	2.90	35.0 $\pm$ 0.0
	3	$[h^{PDB3}]$	16,762,063.67	166.28	175.87	178.09	180.63	9.58	2.22	2.54	33.8 $\pm$ 1.8
	4	$[h^{PDB4}]$	19,683,953.17	168.01	177.78	179.26	183.63	9.77	1.48	4.37	32.4 $\pm$ 2.8
B	5	$[h^{PDB2}, 2 C]$	35,279,521.51	179.08	191.31	191.32	194.31	12.23	0.01	2.99	36.0 $\pm$ 0.0
	6	$[h^{PDB3}, 3 C]$	14,896,276.53	167.41	176.91	179.70	182.31	9.49	2.79	2.62	33.2 $\pm$ 1.9
	7	$[h^{PDB4}, 4 C]$	10,127,736.49	157.09	166.24	167.84	172.41	9.15	1.60	4.57	33.0 $\pm$ 2.5
C	8	$[2 C, h^{PDB2}]$	23,465,651.94	136.89	158.97	154.48	158.78	22.08	0	4.30	26.0 $\pm$ 0.0
	9	$[3 C, h^{PDB3}]$	10,075,985.22	111.44	117.76	126.85	127.69	6.31	9.10	0.83	28.8 $\pm$ 0.8
	10	$[4 C, h^{PDB4}]$	6,066,057.87	101.95	111.48	112.42	123.57	9.54	0.94	11.15	31.0 $\pm$ 4.9
D	11	$[w (h^{PDB1}), h^{PDB1}]$	24,852,925.84	223.43	231.11	233.79	237.11	7.68	2.68	3.32	26.6 $\pm$ 0.5
	12	$[w (h^{PDB2}), h^{PDB2}]$	23,764,256.42	219.52	233.19	234.14	238.07	13.67	0.96	3.92	29.8 $\pm$ 0.4
	13	$[w (h^{PDB3}), h^{PDB3}]$	12,320,342.90	215.49	226.89	230.16	233.44	11.40	3.27	3.28	25.4 $\pm$ 1.1
	14	$[w (h^{PDB4}), h^{PDB4}]$	12,643,046.69	190.76	202.03	204.12	210.70	11.27	2.08	6.59	25.8 $\pm$ 1.5
E	15	$[w (h^{PDB2}), h^{PDB2}, 2 C]$	26,994,420.86	216.29	230.81	231.22	235.19	14.52	0.40	3.97	27.0 $\pm$ 0.0
	16	$[w (h^{PDB3}), h^{PDB3}, 3 C]$	11,842,771.67	211.28	222.30	225.45	228.89	11.02	3.15	3.45	26.0 $\pm$ 1.2
	17	$[w (h^{PDB4}), h^{PDB4}, 4 C]$	7,650,277.40	207.38	218.77	221.24	227.21	11.39	2.47	5.97	26.2 $\pm$ 2.9
F	18	$[w (h^{PDB2}), 2 C, h^{PDB2}]$	15,280,827.99	175.46	201.34	196.69	202.62	25.89	0	5.93	22.8 $\pm$ 0.4
	19	$[w (h^{PDB3}), 3 C, h^{PDB3}]$	7,412,556.34	154.76	162.73	175.78	175.85	7.96	13.06	0.07	20.6 $\pm$ 2.1
	20	$[w (h^{PDB4}), 4 C, h^{PDB4}]$	3,986,192.80	148.37	159.96	161.09	176.33	11.60	1.14	15.24	19.5 $\pm$ 2.4
	21	Aggregation	79,699,027.84	151.46	163.07	161.78	187.00	11.63	0.77	25.22	16.4 $\pm$ 1.3
-		Humans (xSokoban)	-	188.02	199.58	205.62	211.41	11.56	6.04	5.79	18

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Procedural Generation of Initial States of Sokoban

Dâmaris S. Bento,1 André G. Pereira2 &Levi H. S. Lelis1 1Universidade Federal de Viçosa, Brazil

2Universidade Federal do Rio Grande do Sul, Brazil

{damaris.bento, levi.lelis}@ufv.br, [email protected]

Abstract

Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called $\beta$ that uses our hardness metrics and novelty to generate initial states. Experiments show that $\beta$ is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.

1 Introduction

The development of search solvers for state-space search problems is a traditional area of research in artificial intelligence (AI), where notable progress has been made (e.g., Culberson and Schaeffer (1996); Junghanns and Schaeffer (2001); Hoffmann and Nebel (2001); Holte et al. (2016)). By contrast, systems for generating problems have achieved limited success. Generation systems are often able to generate only small and easy problems, thus limiting their applicability.

Systems able to automatically generate solvable state-space search problems have applications in human and machine learning. These systems can be used to generate problems from which humans Anderson et al. (1985) and systems Arfaee et al. (2011); Lelis et al. (2016); Racanière et al. (2017); Orseau et al. (2018) can learn. Automatically generated problems are also used to evaluate planning systems. Problems used in the International Planning Competition are usually generated using domain knowledge and their solvability verified with specialized solvers. The generation of solvable and hard problems can be challenging, since it could be \PSPACE-complete to decide if a problem is solvable. Moreover, it is unclear what makes a problem hard in practice.

In this paper we focus on the task of generating hard and solvable initial states of Sokoban. We introduce hardness metrics based on pattern databases (PDBs) Culberson and Schaeffer (1996) motivated by hardness metrics known to correlate with the time required by humans to solve problems Jarušek and Pelánek (2010). We also propose the use of novelty Lipovetzky and Geffner (2012) to improve the exploration in the task of generating initial states. Finally, we introduce a system called $\beta$ that performs a search guided by our metrics of hardness and novelty, and that also uses our metrics to select hard and solvable initial states.

Although $\beta$ is in principle general and applicable to other problem domains, we focus on Sokoban for several reasons. First, Sokoban is a \PSPACE-complete problem Culberson (1999) and a traditional testbed for search methods. Second, in Sokoban most of the states in its space are unsolvable, thus making the problem of generating solvable initial states particularly challenging. Third, Sokoban has a community of expert designers who create hard problems and make them available online, thus allowing a comparison between human-designed initial states with initial states generated by $\beta$ .

Our empirical results show that $\beta$ is able to generate solvable initial states that are harder to solve by a specialized solver than those designed by human experts. To the best of our knowledge, our work is the first to compare the hardness of initial states generated by a computer system with those designed by human experts.

Contributions

We introduce computationally efficient hardness metrics and show empirically how these metrics can be used to guide a search algorithm and to select hard instances from a pool of options. Another contribution is to show that novelty Lipovetzky and Geffner (2012) is essential for generating a diverse pool of solvable instances from which hard ones can be selected. The combination of our metrics and novelty resulted in the first generative system for Sokoban with superhuman performance in terms of instance difficulty for a specialized solver.

2 Related Work

Related to our work are papers presenting methods for generating entire problems of Sokoban, and more generally, works dealing with automatic content generation. Also related to our work are learning systems that require the generation of a set of states from which one can learn a value or a heuristic function. Next, we briefly survey each of these related areas.

Murase et al. Murase et al. (1996), Taylor and Parberry Taylor and Parberry (2011), and Kartal et al. Kartal et al. (2016) presented methods for generating complete Sokoban problems, where a complete problem is composed of a maze and an initial state. Although the problem of generating complete problems is interesting, we focus on the task of generating initial states and assume that the maze is provided as input. This way we can directly compare the initial states $\beta$ generates with those designed by human experts on a fixed set of mazes. Further, $\beta$ can be used to generate initial states for the problems generated by previous systems.

The systems of Murase et al. Murase et al. (1996) and Taylor and Parberry Taylor and Parberry (2011) were used to generate sets of instances to be employed in the learning process of reinforcement learning (RL) algorithms Racanière et al. (2017). The drawback of using these generative systems is that they are unable to generate large problems, thus limiting RL algorithms to small and easy problems. Since our method generates hard initial states for large Sokoban problems, it will allow RL algorithms to be evaluated in more challenging problems. In the same line of research, Justesen et al. Justesen et al. (2018) showed how systems that generate video game levels can enhance deep RL algorithms.

Arfaee et al. Arfaee et al. (2011) describe a system that generates a set of initial states of a given problem and learns a heuristic function while solving this set of states with IDA∗ Korf (1985). Arfaee et al. experimented with domains for which it is easy to generate a set of solvable states with varied difficulty (e.g., sliding-tile and pancake puzzles). Before our work, Arfaee et al.’s system could not be directly applied to problem domains in which it is hard to generate difficult and solvable states, such as Sokoban. The combination of Arfaee et al.’s system with $\beta$ is a promising direction of future research.

Procedural Content Generation (PCG) methods are used to generate content such as game levels Snodgrass and Ontañón (2017), puzzles Smith and Mateas (2011); Sturtevant and Ota (2018), and educational problems Polozov et al. (2015). Many of the problem domains used in the PCG literature (e.g., math problems for children) are computationally easier than the \PSPACE-complete domain we consider in this paper. Moreover, in contrast with our work, PCG systems are often concerned with properties such as the enjoyability of the problems generated Mariño and Lelis (2016), as opposed to the scalability of the system. Finally, to the best of our knowledge, $\beta$ is the first PCG method employing concepts such as PDB and novelty to the task of problem generation.

3 Background and Problem Definition

Although we focus on the generation of initial states of Sokoban puzzles, our system is general and we describe it as such. Let $\mathcal{P}=\langle S,s_{0},S^{*},A\rangle$ be a state-space search problem. A state is a complete assignment over a set of variables $\mathcal{V}$ , where each variable $v\in\mathcal{V}$ has a finite set domain $D_{v}$ . Let $S$ be a set of states, $s_{0}\in S$ an initial state, $S^{*}\subseteq S$ a set of goal states, $A$ a finite set of actions. An action $a\in A$ is a pair $\langle pre_{a},post_{a}\rangle$ specifying the preconditions and postconditions (effect) of action $a$ . $pre_{a}$ and $post_{a}$ are assignments of values to a subset of variables in $\mathcal{V}$ , denoted $\mathcal{V}_{pre}$ and $\mathcal{V}_{post}$ . Action $a$ is applicable to state $s$ , denoted $s\llbracket a\rrbracket$ , if $s$ agrees with $\mathcal{V}_{pre}$ . The result of $s\llbracket a\rrbracket$ is state $s^{\prime}$ that agrees with $post_{a}$ for the variables in $\mathcal{V}_{post}$ and with $s$ for the remaining variables. We say that the application of $a$ to $s$ generates state $s^{\prime}$ . We refer to the states that can be generated from state $s$ as successor states, denoted $\operatorname{succ}(s)$ . We assume that for each action $a\in A$ there is an inverse action $a^{-1}\in A^{-1}$ such that $s^{\prime}=a\llbracket s\rrbracket$ iff $s=a^{-1}\llbracket s^{\prime}\rrbracket$ . Here, $A^{-1}$ is the set of inverse actions that can applied to $s$ , which yield a set of predecessor states, denoted $\operatorname{pred}(s)$ . A state $s$ is expanded when either $\operatorname{succ}(s)$ or $\operatorname{pred}(s)$ is invoked.

A solution path is a sequence of actions that transforms $s_{0}$ in a state $s_{*}\in S^{*}$ . An optimal solution path is a solution path with minimum length. $h(s)$ is a heuristic function that estimates the length to reach a goal state from $s$ , with $h^{*}$ providing the optimal solution length for all states $s\in S$ . A heuristic $h$ is admissible if $h(s)\leq h^{*}(s)$ for all $s\in S$ .

We can now define the task we are interested in solving.

Definition 1 (initial state generation).

An initial state generation task is defined as $\mathcal{P}_{-s_{0}}=\langle S,S^{*},A\rangle$ . $\mathcal{P}_{-s_{0}}$ is a state-space search problem $\mathcal{P}$ without a initial state $s_{0}$ . In problem $\mathcal{P}_{-s_{0}}$ one has to provide a state $s\in S$ such that $P_{-s_{0}}$ with $s$ as initial state is solvable.

4 Pattern Databases

A pattern database (PDB) is a memory-based heuristic function defined by a pattern $V$ for a problem $\mathcal{P}$ Culberson and Schaeffer (1996); Edelkamp (2001). A pattern is a subset of variables $V\in\mathcal{V}$ that induces an abstracted state-space of $\mathcal{P}$ . The abstracted space is obtained by ignoring the information of variables $v\notin V$ . For a given pattern, the optimal solutions of all states in the induced abstracted space is computed and stored in a look-up table. These stored values can then be used as a heuristic function estimating $h^{*}$ . The combination of multiple PDB heuristics can result in better estimates of $h^{*}$ than the estimates provided by each PDB alone. PDBs can be combined by taking the maximum Holte et al. (2006) or the sum Felner et al. (2004) of multiple PDB estimates. The heuristic function resulting from the sum of multiple PDBs is called an additive PDB heuristic. An additive PDB is admissible if no action affects more than one pattern $V$ , where an action $a$ affects $V$ if $a$ has an effect on any of its variables, i.e., $\mathcal{V}_{post_{a}}\cap V\neq\emptyset$ . Consider $\mathcal{P}_{e}$ , the search problem shown in Figure 1, and the example that follows.

$\mathcal{P}_{e}$ has three variables that can be assigned a value in $\{0,1,2,3,4\}$ . The initial state assigns [math] to all variables, and the goal state has all variables with the value of $3$ . $\mathcal{P}_{e}$ has two types of actions $inc^{v_{i}}_{x}$ and $jump_{v_{i}}$ . Actions $inc^{v_{i}}_{x}$ change the value of the variable $v_{i}$ from $x$ to $x+1$ . Actions $jump_{v_{i}}$ change the value of variable $v_{i}$ from [math] to $3$ , as long as all the other variables have the value of $4$ . All solutions to this problem involve the use of action $inc^{v_{i}}_{x}$ three times to all three variables, yielding an optimal solution length of $9$ . $jump_{v_{i}}$ cannot be used in any solution because if a variable has the value of $4$ , the problem becomes unsolvable since there is no action to change its value.

Example 1.

Let $\{\{v_{1}\},\{v_{2},v_{3}\}\}$ be a collection with two additive patterns for $\mathcal{P}_{e}$ . The heuristic estimate returned for $\mathcal{P}_{e}$ ’s $s_{0}$ with a PDB heuristic with pattern $\{v_{1}\}$ is $1$ , as a single action $jump$ changes the value of $v_{1}$ from $1$ to $3$ , thus achieving the goal in the PDB’s abstracted space. The heuristic estimate returned for $s_{0}$ for pattern $\{v_{2},v_{3}\}$ is $6$ . This is because the action $jump$ is no longer applicable. Since the two patterns are additive, an additive PDB with $\{\{v_{1}\},\{v_{2},v_{3}\}\}$ returns $1+6=7$ as an estimate of the solution cost of $s_{0}$ .

5 Proposed Search Algorithm for $\mathbf{\beta}$

Given a problem $\mathcal{P}_{-s_{0}}$ , $\beta$ performs a backward greedy best-first search Doran and Michie (1966) from the set $S^{*}$ . This is performed by inserting all states $s_{*}\in S^{*}$ in a priority queue denoted open and in a set of generated states denoted closed. $\beta$ expands states from open according to an order $[\cdot]$ , which $\beta$ receives as input. When a state $s$ is removed from open, $\beta$ expands $s$ invoking $\operatorname{pred}(s)$ . Every $s^{\prime}\in\operatorname{pred}(s)$ that is not in closed is inserted in open and closed.

The advantage of performing a backward search is that all states generated that way are guaranteed to have a solution. By contrast, if one assigned random values from the variables’ domains as a means of generating initial states, then one would need to prove the resulting $\mathcal{P}$ problem to be solvable, and in some domains, this task is computationally hard. $\beta$ returns the state $s$ encountered in search with largest objective value once a time or a memory limit is reached. We specify below the objective functions used with $\beta$ .

5.1 Solution Length

We propose two metrics of difficulty: solution length and conflicts. The optimal solution length is traditionally used to evaluate the hardness of initial states for AI methods. For example, solution length correlates with the running time of heuristic search algorithms in problems such as Rubik’s Cube Lelis et al. (2013). We use PDB heuristics to approximate the optimal solution length of state-space problems.

5.2 Conflicts

Intuitively, the hardness of state-space problems arises from the interactions among variables. If sub-problems defined by single variables of the problem can be solved independently, then the problem tends to be easy. However, if one needs to consider interactions between all variables of the problem, then the problem tends to be hard. We use PDBs to capture the interaction of variables in a problem. Let $h^{\text{PDB}_{k}}$ be a PDB heuristic with patterns of size $k$ . The conflicts of state $s$ is defined as follows.

Definition 2 (conflicts).

The number $n$ of conflicts of order $k$ , denoted $kC$ , for $k>1$ of a state $s$ is $n=h^{\text{PDB}_{k}}(s)-h^{\text{PDB}_{k-1}}(s)$ , if $n>0$ ; $n=0$ , otherwise.

Suppose we have the following PDB heuristics for our running example: $h^{\text{PDB1}}$ , $h^{\text{PDB2}}$ and $h^{\text{PDB3}}$ . $h^{\text{PDB1}}$ uses three additive patterns $\{\{v_{1}\},\{v_{2}\},\{v_{3}\}\}$ , which results in $h^{\text{PDB1}}(s_{0})=3$ (the action $jump$ is applicable to each pattern). $h^{\text{PDB2}}$ uses two additive patterns, one with two variables and another with one variable: $\{\{v_{1}\},\{v_{2},v_{3},\}\}$ . For these patterns $h^{\text{PDB2}}(s_{0})=7$ . Finally, $h^{\text{PDB3}}$ uses one pattern with all three variables and returns the optimal solution of $9$ .

Using these PDB-values we can compute the conflicts of $s_{0}$ . $s_{0}$ has four conflicts of order two, denoted $2C$ ( $h^{\text{PDB2}}(s_{0})-h^{\text{PDB1}}(s_{0})=4$ ) and two conflicts of order three, denoted $3C$ ( $h^{\text{PDB3}}(s_{0})-h^{\text{PDB2}}(s_{0})=2$ ). The number of $2C$ conflicts approximates the effort in terms of solution length due to the interaction of pairs of variables. In general, the number of $kC$ conflicts approximates the effort due to the interaction of $k$ variables.

5.3 Novelty

Next, we discuss novelty, which we use to enhance the exploration of our search algorithm. Novelty $w$ is a structural measure proposed to enhance the exploration of search algorithms in the task of finding solutions for search problems Lipovetzky and Geffner (2012, 2017). Novelty evaluates states with respect to the order in which an algorithm generates them.

Definition 3 (novelty).

Let $\mathcal{P}$ be a state-space search problem, $s\in S$ be a state generated by a search algorithm, and $h$ be a heuristic function. The novelty of $s$ for $h$ is $|\mathcal{V}|-n+1$ if $s$ is the first generated state during search with heuristic value $h(s)$ and with a subset of $n$ variables being assigned values $d_{v_{1}},\cdots,d_{v_{n}}$ and no smaller subset of variables has this property.

The novelty of a state $s$ generated during the search is maximal if, for the first time in search, $s$ has a variable $v$ for which the value $d_{v}\in D_{v}$ is assigned. The novelty of $s$ is minimum if $n=|\mathcal{V}|$ , which means that it requires all variables in $\mathcal{V}$ to differ $s$ from previously generated states. Intuitively, a state has larger novelty if it requires a smaller number of variables to assume a previously unseen assignment of values.111Our semantics for novelty differs from Lipovetzky and Geffner’s semantics because they dealt with minimization problems while we deal with a maximization problem.

Example 2.

Suppose we run $\beta$ in $\mathcal{P}_{e}$ with the ordering $[\cdot]$ defined by novelty and that all states have the same heuristic value. State $s_{*}$ has novelty $w(s_{*})=3$ because this is the first state to assign $3$ to $v_{1}$ . Then, $\beta$ invokes $\operatorname{pred}(s_{*})$ and generates three states $s_{1},s_{2},s_{3}$ by applying the $inc^{-1}$ (the inverse of $inc$ ) actions to each variable. For simplicity, we will assume that $jump^{-1}$ is not applicable. $s_{1},s_{2},s_{3}$ have $w=3$ because each of them assigns $2$ to a variable for the first time. Next, suppose $\beta$ expands $s_{1}=\{v_{1}=2,v_{2}=3,v_{3}=3\}$ , then it generates three states $\{v_{1}=1,v_{2}=3,v_{3}=3\}$ , $\{v_{1}=2,v_{2}=2,v_{3}=3\}$ and $\{v_{1}=2,v_{2}=3,v_{3}=2\}$ , where the last two states have $w=2$ because the smallest subset with newly assigned values is two: $\{v_{1}=2,v_{2}=2\}$ for the second and $\{v_{1}=2,v_{3}=2\}$ for the third.

We hypothesize that novelty can improve the exploration of the search performed by $\beta$ , thus allowing it to visit a diverse set of states from which one can select initial states that maximize solution length and number of conflicts.

6 Difficulty Metrics for Humans in Sokoban

Jarušek and Pelánek Jarušek and Pelánek (2010) performed an extensive user study showing strong positive correlations between metrics of difficulty they introduced with the time required by humans to solve Sokoban problems. In this section we introduce the domain of Sokoban and present their metrics.

6.1 Sokoban

Sokoban is a transportation problem played in a maze grid. The maze is defined by locations occupied by immovable blocks (walls), free (inner) locations and goal (inner) locations. An initial state of Sokoban is a placement of $k$ boxes and a man in inner locations of the maze. The goal is to push, through the man, all boxes from their initial location to a goal location while minimizing the number of pushes.

Figure 2 shows a Sokoban maze with a wall at A $1$ , an free location at C $3$ and goal location at B $4$ . An initial state is defined by the assignment of the locations of boxes and the man. A state has a $k+1$ variables, one for each box and another for the man. The instance has $k$ goal locations, one for each box. The domain of each variable is the set of inner locations. Goal states have all $k$ boxes at one of the $k$ goal locations.

Figure 2 shows a box at C $2$ and the man at D $2$ . We name goal locations and boxes by their locations in the maze. For example, the box at C $2$ is box C $2$ . An action corresponds to the man pushing a box to an adjacent empty inner location. The precondition of an action is that the man must be able to reach the orthogonal location adjacent to the box. For example, in Figure 2 box C $2$ can be pushed to B $2$ , but box D $3$ cannot be pushed anywhere.

6.2 [Jarušek and

Pelánek](#bib.bib11)’s Metrics

The first metric introduced by Jarušek and Pelánek Jarušek and Pelánek (2010) was solution length, which is exactly what we approximate with PDBs. Jarušek and Pelánek showed that the exact solution length yields a Spearman correlation of $0.47$ with the time required by humans to solve Sokoban problems.

Jarušek and Pelánek Jarušek and Pelánek (2010) introduced a metric named “number of counterintuitive moves”. In Figure 2, while it is intuitive to push box C $2$ to a goal location (push it to the left), the pushes required to move boxes D $3$ and D $4$ to a goal location are “counterintuitive”. This is because one needs to push either D $3$ or D $4$ away from a goal location before pushing them to a goal location. These counterintuitive pushes can be approximated by our metric of conflicts. In Figure 2, a PDB that considers only one box estimates that one can directly push all three boxes to the goal, yielding an estimate of $1+2+2=5$ pushes: one for C $2$ and two for D $3$ and D $4$ . A PDB that considers two boxes captures the interaction between D $3$ and D $4$ . Such a PDB discovers that one needs to push either D $3$ or D $4$ two locations to the right, before pushing them to a goal location, for an estimated solution length of $1+6+2=9$ . The difference $9-5=4$ is the number of Jarušek and Pelánek’s counterintuitive moves captured by our metric of conflicts. Jarušek and Pelánek showed that the exact number of conflicts has a Spearman correlation of $0.69$ with the time required by humans to solve Sokoban problems.

Jarušek and Pelánek Jarušek and Pelánek (2010) also introduced “problem decomposition”, a metric that measures the degree in which subsets of boxes can be independently pushed to the goal. This metric yields a Spearman correlation of 0.82 with the time required by humans to solve the puzzles. Our order $k$ of conflicts approximates similar property. If $k$ is the largest value for which $h^{\text{PDB}_{k}}(s)-h^{\text{PDB}_{k-1}}(s)>0$ for all possible PDBs with $k$ and $k-1$ variables, then the largest number of variables with counterintuitive moves in $s$ is $k$ . Although one might be unable to divide $s$ into independent subproblems with $k$ variables (e.g., the order in which the subproblems are solved might matter), one is able to reason about the subproblems with at most $k$ variables somewhat independently.

The metrics introduced by Jarušek and Pelánek Jarušek and Pelánek (2010) are computationally hard in general as they require one to solve Sokoban problems to compute their values. By contrast, our metrics can be computed efficiently and thus be used within search procedures for generating hard initial states.

7 Empirical Evaluation

We use the $90$ problems of the xSokoban benchmark in our experiments. These problems were developed by human experts and are known to be challenging to both humans and AI methods. The xSokoban benchmark allows us to directly compare the initial states our method generates with those created by human experts in terms of the metrics proposed and number of problems solved by a solver. Moreover, the $\mathcal{P}_{-s_{0}}$ problems present in xSokoban contain many more boxes (the largest problem has $34$ boxes) than the problems generated by the current generative methods (the largest problem reported in the literature has $7$ boxes).

We instantiate variants of $\beta$ by varying the ordering $[\cdot]$ in which the states are expanded and how initial states are selected. $[\cdot]$ is composed by a list of features, which can include novelty $w(h)$ , a PDB heuristic $h^{\text{PDBk}}$ , and the number of conflicts of order $k$ , denoted $kC$ . $\beta$ returns the state generated with largest ordering value, ignoring novelty, once the search reaches the time or memory limit. For example, $\beta$ with ordering $[w(h^{\text{PDB4}}),4C,h^{\text{PDB4}}]$ expands states with largest $w(h^{\text{PDB4}})$ first, with the ties broken according to $4$ C and then according to $h^{\text{PDB4}}$ . $\beta$ returns the state $s$ with largest $4$ C-value, with ties broken according to $h^{\text{PDB4}}$ ; remaining ties are broken by the state generation order.

Heuristic $h^{\text{PDBk'}}$ returns the maximum heuristic value for a set with $|\mathcal{V}|+1$ additive PDBs. We use this number of PDBs to ensure that instances with more variables have more PDBs. The patterns of each of the $|\mathcal{V}|+1$ additive PDBs are defined by iteratively and randomly selecting without replacement $k^{\prime}$ variables from $\mathcal{V}$ , until all variables are in a pattern. If $|\mathcal{V}|\mod k^{\prime}>0$ , then the last pattern will have $|\mathcal{V}|\mod k^{\prime}$ variables. All patterns include the man.

The solver we use to evaluate the hardness of the instances generated by $\beta$ and by human experts is the one introduced by Pereira et al. Pereira et al. (2015), which we refer to as PRB.

We are primarily interested in comparing the states generated $\beta$ with those designed by human experts in terms of problems solved by PRB. We also present the average number of states expanded by each variant of $\beta$ while generating the 90 initial states, the average $h$ -value of the initial states generated in terms of $h^{\text{PDB1}}$ , $h^{\text{PDB2}}$ , $h^{\text{PDB3}}$ and $h^{\text{PDB4}}$ , and the average number of conflicts 2C, 3C, and 4C. We also experimented with random walks (RW) and breadth-first search (BFS) as the search algorithms used by $\beta$ and PRB solves almost all problems generated by these approaches (approximately 85 out of 90). This is because both methods generate initial states with very short solution length. RW and BFS results are omitted from the table of results for clarity.

All experiments are run on $2.66$ GHz CPUs, $\beta$ is allowed 1 hour of computation time and $8$ GB of memory, while the solver is allowed $10$ minutes and $4$ GB (more memory and running time have a small impact on PRB). Due to the randomness of the PDBs, we report the average results of $5$ runs of each approach. We also report the standard deviation for the number of problems PRB solves. Table 1 presents the results of our proposed method. The third column of the table (“Generation Method”) shows the order used to guide the $\beta$ search and to select initial states in our experiments. We also present the number of states $\beta$ expanded, number of problems solved by PRB, and the $h$ -values and conflict values, the table also presents a column “#” indicating the row of each method, which are used as a reference in our discussion. We highlight in bold of cells with the best result for a given criterion. For example, $[w(h^{\text{PDB1}}),h^{\text{PDB1}}]$ is the best performing method in terms of value of $h^{\text{PDB1}}$ .

7.1 Quantitative Results

We start by comparing the $\beta$ results without novelty (see blocks $A$ , $B$ and $C$ in Table 1) with those with novelty (blocks $D$ , $E$ and $F$ ). Methods that use novelty outperform their counterparts in all criteria: average $h$ -value, number of conflicts, and the total number of problems solved. Compare, for example, the results in rows $1$ – $4$ with their counterparts in rows $11$ – $14$ . In particular, $[w(h^{\text{PDB1}}),h^{\text{PDB1}}]$ obtained an average $h^{\text{PDB1}}$ -value of $223.43$ , which is larger than the value of $188.02$ for the average of the xSokoban initial states. Although $[w(h^{\text{PDB1}}),h^{\text{PDB1}}]$ can generate initial states with larger $h^{\text{PDB1}}$ values than the xSokoban, its initial states tend to be much easier in terms of the number of problems solved than the xSokoban. PRB solves at least seven more initial states generated by the methods in block $D$ than the initial states created by human experts.

The methods shown in blocks $B$ and $E$ maximize the heuristic value and break ties by conflict values. Methods in blocks $C$ and $F$ use the opposite order of metrics. The methods in $B$ generate initial states with longer solution lengths than the methods in $C$ . However, the methods in $C$ have a larger number of conflicts, according to the optimized criteria. For example, the method in row $8$ , which prioritizes the maximization of $2$ C-values, generates initial states with larger $2$ C-values than its counterpart in row $5$ . In general, the initial states generated by the methods in $C$ tend to be much harder in terms of the number of problems solved by PRB. These results demonstrate the importance of generating initial states with a large number of conflicts. The same pattern is observed by comparing the results of the methods in $E$ with those of the methods in $F$ . The results in $F$ highlight the importance of combining the maximization of conflicts of higher order with novelty to enhance exploration. The method shown in line 20 is already competitive with the xSokoban benchmark in terms of number of problems solved by PRB.

The method presented in line $21$ shows the average numbers of $5$ runs of an aggregation procedure. One run of this procedure represents $20$ runs of $[w(h^{\text{PDB4}}),4C,h^{\text{PDB4}}]$ . Then, out of the $20$ runs, we select for each maze the initial state that maximizes $[4C,h^{\text{PDB4}}]$ . This procedure further optimizes the average 4C-value (25.22) at the cost of a larger number of state expansions and it outperforms the initial states designed by humans in terms of number of problems PRB is able to solve. PRB solves 18 xSokoban instances and only an average of 16.4 instances of the aggregation procedure.

The metric that correlated to most with the time required by humans to solve Sokoban problems in Jarušek and Pelánek’s study was “problem decomposition,” which is similar to our order of conflict. Our results suggest that states with many conflicts of high order tend to be harder to solve by PRB. Taken together with Jarušek and Pelánek’s user study, our results suggest that some of the properties that make problems hard for humans can also make them hard for AI systems.

7.2 Qualitative Results

Figure 3 shows representative initial states for two xSokoban mazes. The initial states shown at the top were generated by a representative run of $[w(h^{\text{PDB4}}),4C,h^{\text{PDB4}}]$ and at the bottom are the original xSokoban problems. These initial states are visually similar since there are no obvious features that distinguish those states generated by $\beta$ from those created by humans. The states generated by our method have larger values of $h^{\text{PDB4}}$ for these mazes: $219$ and $223$ , while the values are $167$ and $202$ for xSokoban. PRB solves the xSokoban problem for maze #53, but does not solve the $\beta$ state for the same maze. Although both initial states have the same $4$ C-value for maze #53 ( $=2$ ), our initial state has a much longer solution length, which could explain the result. The situation is reversed for maze #62, where PRB solves the initial state generated by our approach, but does not solve the initial state designed by humans, and both states also have the same $4$ C-value ( $=6$ ). We conjecture that the xSokoban state for maze #62 has conflicts of order higher than $4$ , which our PDBs are unable to capture. These conflicts of higher order could make the xSokoban problem harder to solve than the one generated by our system.

Currently, no solver is able to solve all xSokoban problems. Yet, humans can solve all of them. Thus, humans are still superior to AI methods in the task of solving Sokoban problems. By contrast, our results show that $\beta$ achieves superhuman performance in the task of generating initial states in terms of instance difficulty for a solver.

Although we evaluated $\beta$ only on Sokoban, we believe it can generalize to other domains. This is because all techniques used in our system are Sokoban independent. Moreover, Sokoban is a prototypical problem of a large class of problems known as transportation problems. Problems in this class has four central characteristics: there is a set of connected locations, a set of movable objects and a set of goal locations, and the solution is a sequence of actions that move (a subset of) the movable objects to (a subset of) the goal locations. Sokoban presents all these characteristics thus our system is likely to perform well in these problems as well.

8 Conclusions

In this paper we presented a system for generating hard and solvable initial states of Sokoban. Our system $\beta$ performs a backward search from the goal states of a problem and selects states using PDB-based hardness metrics. We also proposed the use of novelty to guide $\beta$ ’s search. Compared to the states designed by human experts, $\beta$ is able to generate states that are harder to solve by a solver. $\beta$ is the first generative system of initial states of Sokoban with superhuman performance in terms of difficulty for a solver.

Acknowledgments

André Pereira acknowledges support from FAPERGS with project $17/2551-0000867-7$ and Levi Lelis acknowledges support from FAPEMIG and CNPq. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson et al. [1985] John R Anderson, C Franklin Boyle, and Brian J Reiser. Intelligent tutoring systems. Science , 228:456–462, 1985.
2Arfaee et al. [2011] Shahab Jabbari Arfaee, Sandra Zilles, and Robert Craig Holte. Learning heuristic functions for large state spaces. Artificial Intelligence , 175:2075–2098, 2011.
3Culberson and Schaeffer [1996] Joseph C. Culberson and Jonathan Schaeffer. Searching with pattern databases. In Canadian Conference on Artificial Intelligence , pages 402–416, 1996.
4Culberson [1999] Joseph C. Culberson. Sokoban is PSPACE-Complete. In Fun With Algorithms , pages 65–76, 1999.
5Doran and Michie [1966] Jim E Doran and Donald Michie. Experiments with the graph traverser program. In Royal Society of London A , volume 294, pages 235–259. The Royal Society, 1966.
6Edelkamp [2001] Stefan Edelkamp. Planning with pattern databases. In European Conference on Planning , pages 13–24, 2001.
7Felner et al. [2004] Ariel Felner, Richard E. Korf, and Sarit Hanan. Additive pattern database heuristics. Journal of Artificial Intelligence Research , 22:279–318, 2004.
8Hoffmann and Nebel [2001] Jörg Hoffmann and Bernhard Nebel. The ff planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research , 14(1):253–302, 2001.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Procedural Generation of Initial States of Sokoban

Abstract

1 Introduction

Contributions

2 Related Work

3 Background and Problem Definition

Definition 1** (initial state generation).**

4 Pattern Databases

Example 1**.**

5 Proposed Search Algorithm for β\mathbf{\beta}β

5.1 Solution Length

5.2 Conflicts

Definition 2** (conflicts).**

5.3 Novelty

Definition 3** (novelty).**

Example 2**.**

6 Difficulty Metrics for Humans in Sokoban

6.1 Sokoban

6.2 [Jarušek and

7 Empirical Evaluation

7.1 Quantitative Results

7.2 Qualitative Results

8 Conclusions

Acknowledgments

Definition 1 (initial state generation).

Example 1.

5 Proposed Search Algorithm for $\mathbf{\beta}$

Definition 2 (conflicts).

Definition 3 (novelty).

Example 2.