Constrained optimization under uncertainty for decision-making problems:   Application to Real-Time Strategy games

Valentin Antuori; Florian Richoux

arXiv:1901.00942·cs.AI·May 24, 2022

Constrained optimization under uncertainty for decision-making problems: Application to Real-Time Strategy games

Valentin Antuori, Florian Richoux

PDF

1 Repo

TL;DR

This paper introduces a method to incorporate uncertainty into combinatorial optimization problems using classical Constraint Programming by integrating Rank Dependent Utility, demonstrated through a competitive game-playing bot.

Contribution

It presents a novel approach to handle uncertainty within traditional Constraint Programming frameworks without developing new formalisms or solvers.

Findings

01

Successfully integrated uncertainty handling into standard Constraint Programming.

02

Developed a competitive game-playing bot for the 2018 {}RTS AI competition.

03

Showed that existing solvers can address complex decision-making under uncertainty.

Abstract

Decision-making problems can be modeled as combinatorial optimization problems with Constraint Programming formalisms such as Constrained Optimization Problems. However, few Constraint Programming formalisms can deal with both optimization and uncertainty at the same time, and none of them are convenient to model problems we tackle in this paper. Here, we propose a way to deal with combinatorial optimization problems under uncertainty within the classical Constrained Optimization Problems formalism by injecting the Rank Dependent Utility from decision theory. We also propose a proof of concept of our method to show it is implementable and can solve concrete decision-making problems using a regular constraint solver, and propose a bot that won the partially observable track of the 2018 {\mu}RTS AI competition. Our result shows it is possible to handle uncertainty with regular…

Tables2

Table 1. Table 1 : Results of 100 games played against LightRush bot on three small maps. In bold, results with the highest score for each map.

		Map size
		8x8	12x12	16x16
Baseline	Win	14	38	50
	Tie	0	2	12
	Loss	86	60	38
	Score	14	39	56
Expected utility	Win	27	35	52
	Tie	2	6	9
	Loss	71	59	39
	Score	28	38	56.5
RDU with optimistic $ϕ$	Win	23	44	55
	Tie	10	7	13
	Loss	67	49	32
	Score	28	47.5	61.5
RDU with pessimistic $ϕ$	Win	26	50	57
	Tie	6	5	5
	Loss	68	45	38
	Score	29	52.5	59.5

Table 2. Table 2 : Results of 100 games played against LightRush bot on three large maps. In bold, results with the highest score for each map.

		Map size
		24x24	32x32	64x64
Baseline	Win	59	56	21
	Tie	34	37	79
	Loss	7	7	0
	Score	76	74.5	60.5
Expected utility	Win	60	62	28
	Tie	37	35	70
	Loss	3	3	2
	Score	78.5	79.5	63
RDU with optimistic $ϕ$	Win	71	63	24
	Tie	21	32	76
	Loss	8	5	0
	Score	81.5	79	62
RDU with pessimistic $ϕ$	Win	66	54	27
	Tie	25	38	73
	Loss	9	8	0
	Score	78.5	73	63.5

Equations4

RDU(l)=u(x_{1})+\big{(}u(x_{2})-u(x_{1})\big{)}*\phi\left(\sum_{i=2}^{n}p_{i}\right)+\big{(}u(x_{3})-u(x_{2})\big{)}*\phi\left(\sum_{i=3}^{n}p_{i}\right)+\ldots+\big{(}u(x_{n})-u(x_{n-1})\big{)}*\phi(p_{n})

RDU(l)=u(x_{1})+\big{(}u(x_{2})-u(x_{1})\big{)}*\phi\left(\sum_{i=2}^{n}p_{i}\right)+\big{(}u(x_{3})-u(x_{2})\big{)}*\phi\left(\sum_{i=3}^{n}p_{i}\right)+\ldots+\big{(}u(x_{n})-u(x_{n-1})\big{)}*\phi(p_{n})

t a r g e t_{X} = min {1, (H X * a ss i g n_{H X} + R X * a ss i g n_{R X} + L X * a ss i g n_{L X} - e n e m y U ni t s_{X})}

t a r g e t_{X} = min {1, (H X * a ss i g n_{H X} + R X * a ss i g n_{R X} + L X * a ss i g n_{L X} - e n e m y U ni t s_{X})}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richoux/microrts-uncertainty
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Constrained optimization under uncertainty

for decision-making problems:

Application to Real-Time Strategy games

Valentin Antuori

LS2N, Université de Nantes, France

Florian Richoux

LS2N, Université de Nantes, France

JFLI, CNRS, National Institute of Informatics, Japan

Abstract

Decision-making problems can be modeled as combinatorial optimization problems with Constraint Programming formalisms such as Constrained Optimization Problems. However, few Constraint Programming formalisms can deal with both optimization and uncertainty at the same time, and none of them are convenient to model problems we tackle in this paper. Here, we propose a way to deal with combinatorial optimization problems under uncertainty within the classical Constrained Optimization Problems formalism by injecting the Rank Dependent Utility from decision theory. We also propose a proof of concept of our method to show it is implementable and can solve concrete decision-making problems using a regular constraint solver, and propose a bot that won the partially observable track of the 2018 $\mathrm{\SIUnitSymbolMicro}$ RTS AI competition. Our result shows it is possible to handle uncertainty with regular Constraint Programming solvers, without having to define a new formalism neither to develop dedicated solvers. This brings new perspective to tackle uncertainty in Constraint Programming.

1 Introduction

Decision-making problems can be modeled as combinatorial optimization problems through a given formalism, and then can be solved with appropriated tools, i.e., solvers. Combinatorial optimization problems are very frequent problems in domains such as logistics, finance, supply chain, planning, scheduling and in industries such as pharmaceutical industry, transportation, manufacturing and automotive industry **[15]**.

Strategy games propose a rich environment to study decision-making problems, allowing researchers to develop new algorithmic approaches to model and solve such problems. This is particularly true for Real-Time Strategy games, or RTS games, offering a dynamic environment under a fog of war forbidding players to have a complete information about the game state. Such environments contain many challenging combinatorial optimization problems.

Combinatorial optimization problems can be expressed through different formalisms. One convenient formalism used in AI is Constraint Satisfaction Problems (CSP) and Constrained Optimization Problems (COP). The first formalism deals with satisfaction problems, i.e., problems where all solutions have the same quality. In this paper, a solution is an assignment of each variable of the problem such that all constraints are satisfied. The second formalism COP deals with optimization problems, i.e., problems where there is a criteria to rank solutions.

There exist many extensions of the CSP formalisms dealing with uncertainty, but very few of them have been extended to handle optimization problems, and when they did, they force to declare additional parameters that might be undesirable and inconvenient while modeling a problem.

*This paper proposes a way to deal with a specific kind of decision-making problems through combinatorial optimization under uncertainty within the classical COP formalism using the Rank Dependent Utility from decision theory. We exhibit a proof of concept with a simple bot playing to the * $\mathrm{\SIUnitSymbolMicro}$ *RTS game while solving a decision-making problem of choosing the right units to produce. Our bot has won the partially observable track of the 2018 * $\mathrm{\SIUnitSymbolMicro}$ RTS AI competition.

*This paper is organized as follows: We first motivate why we focus on single-stage decision-making problem and why uncertainty is exclusively in the objective function in Section 2. Then, we introduce basic notions about Constraint Programming and Decision Theory in Section 3. In Section 4, we expose our main contribution: a way to handle uncertainty within the classical COP formalism using the Rank Dependent Utility and finally give a proof of concept in Section 5 by modeling with a COP a decision-making problem under uncertainty in * $\mathrm{\SIUnitSymbolMicro}$ RTS. Related works can be found in Section 6. We conclude in Section 7.

2 Motivation

We introduce in this section the type of problems we focus on, and motivate why such problems worth to be specifically tackled.

In this paper, we study single-stage decision-making problem an agent must solve under uncertainty, where uncertainty lies on the value of some stochastic variables controlled by a third-party agent (such as the environment where our agent evolves). Such stochastic values only have an impact on the objective function the agent tries to maximize or minimize, and not on constraints it must satisfy.

We think important to motivate the two following points: why single-stage decision-making problems only, and why only considering uncertainty on the objective function rather than on both the objective function and the constraints.

Studying single-stage decision-making problems means that a decision must be made before revealing stochastic values so far unknown. Once these values are known, the agent can only observe the consequences of its decision without having the possibility to sharpen or fix it like in multi-stage decision-making processes. Although multi-stage decision-making problems are interesting and would deserve a proper study, we think single-stage decision-making problems are still relevant and capture all one-shot decision-making problems that must be made recurrently. Concrete examples can be 1. a factory manager deciding about the production of the month taking into account the stock (known) and client orders (unknown), 2. blind auctions where one aims to win some auctions taking into account the available money (known) and other participants bid (unknown) or 3. air traffic management where one must take into account the number of waiting planes for taking off and landing (known) and future demands (unknown). In his PhD thesis **[10]**, Éric Piette shows that decision-making problems in strategy games can be handle in practice by single-stage decision-making problems only. Finally, another reason to study single-stage decision-making problems is that some environments do not allow multi-stage problems: To do multi-stage decision-making, some stochastic variables must be revealed at each stage. However, it is easy to find natural problems where stochastic variables are never completely revealed. This is the case in RTS games for instance, where the fog of war is never completely dissipated. The problem we tackle in the paper belongs to this category.

Considering uncertainty having only an impact on a solution quality (the objective function) rather than its possibilities (the constraints) makes sense for the same reasons as above: There are many concrete decision-making problems where one knows what is possible and what is not, but does not known what the quality of its decisions will be. In other words, the scope of our possible decisions is known (our constraints) but we live in an uncertain, dynamic environment where events out of our control can impact not the applicability of our decisions but their quality. Examples cited in the previous paragraph are still relevant here: whatever our client orders, we can make a production plan of the month regarding our available stock only; we can bid to an auction regarding only our available money, but we can lose because of better bids; and we can plan air traffic knowing the current situation, but we can be overwhelmed by a group of arriving planes if we made bad runway assignments.

3 Preliminaries

3.1 Constraint Programming

The basic idea behind Constraint Programming is to deal with combinatorial problems by splitting them up into two distinct parts: the first part is modeling your problem via one Constraint Programming formalism. This is usually done by a human being and this task must be ideally easy and intuitive. The second part consists in finding one or several solutions based on your model. This is done by a solver, i.e., a program running without any human interventions.

The two main formalisms in Constraint Programing are Constraint Satisfaction Problems (CSP) and Constrained Optimization Problems (COP). The difference between a CSP and a COP is simple:

A CSP models a satisfaction problem, i.e., a problem where all solutions are equivalent; the goal is then to just find one of them, if any. For instance: finding a solution of a Sudoku grid. Good grids lead to a unique solution, but let’s consider several solutions are possible for a given grid. Then, finding one solution is sufficient, and no solutions seem better than another one. Sometimes, we may also be interested in finding all solutions of a problem instance.

A COP models an optimization problem, where some solutions are better than others. For instance: Several paths may exist from home to workplace, but one of them is the shortest.

Formally, a CSP is defined by a tuple ( $V$ , $D$ , $C$ ) such that:

•

$V$ * is a set of variables,*

•

$D$ * is a domain, i.e., a set of values for variables in $V$ ,*

•

$C$ * is a set of constraints.*

A constraint over $k$ variables can be seen as a function from $D^{k}$ to $\{true,false\}$ to make explicit what combinations of values among its $k$ variables are allowed or not.

Notice that $D$ should formally be the set of the domain for each variable in $V$ , thus a set of sets of values. However, it is common to define the same set of values for all variables of $V$ , thus one can simplify $D$ to be the set of values each variable in $V$ can take.

A CSP models a problem, and a problem instance is expressed by a CSP formula, i.e., a set of constraints applied on variables in $V$ where all constraints are linked by a logical and. The goal is then to attribute a value in $D$ for each variable in $V$ such that all constraints in $C$ are satisfied, i.e., outputs true.

A COP is defined by a tuple ( $V$ , $D$ , $C$ , $f$ ) where $V$ , $D$ and $C$ represent the same sets as a CSP, and $f$ is an objective function applied on variables in $V$ . The goal is first to find a solution, i.e., a value of each variable such that all constraints are satisfied, like for CSP, but moreover to find the solution minimizing or maximizing the objective function $f$ among all possible solutions.

CSP* and COP deal with certain information only. There exist many extensions of the CSP formalisms dealing with uncertainty: Mixed CSP, Probabilistic CSP, Stochastic CSP, etc. We invite the reader to look at surveys [17] and [5] on this topic. However, few are convenient to model a decision-making problem where one does know what his or her possible choices are (i.e., variables, domains and constraints are known and fixed), but a third-party agent (a person, an environment, etc) fixes the values of some specific variables. These values are unknown at the moment we must make a decision and impact the value output by the objective function. Stochastic CSP [19] is the most well adapted formalism to model such problems, but with the huge drawback that constraints are considered to be chance-constraints, i.e., constraints are considered true if their probability to be true reaches a given threshold. The main problem with such a formalism is that this threshold must be provided by the human being modeling the problem, and it is often unclear in practice how to fix a good threshold value for a given problem. This does not follow the Constraint Programming philosophy where problem models must be easy to produce by a human being, without any arbitrary choices.*

Moreover, while COP are a trivial extension of CSP with an objective function, it is absolutely not clear how to extend constraint satisfaction formalisms under uncertainty to deal with optimization problems. Indeed, to each solution of a problem can correspond several possible objective function values, due to uncertainty on stochastic variables, and such values depend on the state of an environment determining stochastic variable values. How is it then possible to discriminate solutions between them?

To the best of our knowledge, no Constraint Programming formalisms without chance-constraints able to handle optimization problems under uncertainty have ever been proposed. We propose in Section 4 a way to deal with uncertainty within the classical COP formalism, allowing us to solve such problem models with classical solvers.

3.2 Decision Theory

We consider the set $\mathcal{D}$ of decisions an agent can take. The goal is to define a preference relation $\succeq_{\mathcal{D}}$ on this set. Preferring the decision $d_{1}$ over $d_{2}$ means to prefer $d_{1}$ consequences over $d_{2}$ ones, thus we can also consider a space $\mathcal{X}$ of consequences, and study a preference relation $\succeq_{\mathcal{X}}$ on this space in such a way that we have $d_{1}\succeq_{\mathcal{D}}d_{2}\iff x_{1}\succeq_{\mathcal{X}}x_{2}$ where $x_{i}$ is the consequence of the decision $d_{i}$ . However, we do not have this equivalence anymore when uncertainty comes into play, because we are not sure anymore the decision $d$ will lead to the consequence $x$ .

In uncertain environments, we consider the set $S$ of possible states of the environment. We consider consequences to be sets of states after making a decision. Thus, we have $\mathcal{X}=\mathcal{P}(S)$ .

Utility-based theories consider $P$ a probability distribution over $\mathcal{X}$ , i.e., a probability distribution over sets of possible states in $S$ . Let $p_{d}$ be the probability following $P$ of obtaining the consequence $x_{d}\in\mathcal{X}$ after making the decision $d$ .

We can then introduce the notion of lottery. A lottery $l$ is a tuple $(x_{1},p_{1};...;x_{n},p_{n})$ where $x_{i}$ is a consequence and $p_{i}$ its associated probability, such that $\sum_{i=1}^{n}p_{i}=1$ . A lottery is thus a sum-up of a decision in the sense it represents the list of possible consequences of a decision with their associated probabilities. Let $\mathcal{L}$ be the set of lotteries. We can then define a preference relation $\succeq_{\mathcal{L}}$ on $\mathcal{L}$ . How we define $\succeq_{\mathcal{L}}$ exactly depends on the decision theory, but the idea is to bring back the equivalence $d_{1}\succeq_{\mathcal{D}}d_{2}\iff l_{1}\succeq_{\mathcal{L}}l_{2}$ .

There exist different works on decision theory to establish this equivalence. We have thus the notion of Expected Utility (EU) defined by **[18]** in the game theory framework. However EU has a limited power of expression since one can quickly derivate paradoxes such as the Allais Paradox violating the independence axiom, telling that if someone has no preference between decisions A and B, then he or she must still not have no the preference if we mix A and B with some decision C.

Choquet Expected Utility is a decision theory based on capacities, a notion generalizing probabilities. A special case of Choquet Expected Utility restricted to probability deformation function is the Rank Dependent Utility (RDU) introduced by **[11, 12]**. RDU has more power of expression than EU since it can explain the Allais Paradox. Unlike EU, RDU allow to model attraction or repulsion to risks through a probability deformation function. This can help to modify on-the-fly the behavior of an agent taking a decision regarding its environment.

The Rank Dependent Utility is then a way to compute $\succeq_{\mathcal{L}}$ , and then to evaluate and compare lotteries such that $l_{1}\succeq_{\mathcal{L}}l_{2}\iff RDU(l_{1})\geq RDU(l_{2})$ . RDU applied to the lottery $l$ is the function defined by Equation 1.

In Equation 1, $u(x)$ is a utility function over the consequence space, intuitively giving a score to consequences, and $\phi(p)$ an increasing function from $[0,1]$ to $[0,1]$ and interpreted as a probability deformation function. The function $\phi(p)$ can be anything, as soon as it is monotone and both equalities $\phi(0)=0$ and $\phi(1)=1$ hold. Consequences in the lottery $l$ are ordered such that $\forall x_{i},x_{j}$ with $i<j$ , we have $u(x_{i})\leq u(x_{j})$ .

This probability deformation function $\phi$ allows to model risk-aversion since a concave $\phi$ function defines an attraction to risks and a convex $\phi$ function a repulsion to risks. Intuitively, if we have $\phi(p)\leq p$ for all $p$ , then the agent taking a decision will underestimate gains probabilities and then will show a kind of pessimism about risks. We will have the opposite behavior if we have $\phi(p)\geq p$ for all $p$ . Notice that sigmoid functions, which are neither concave nor convex, are also possible. In our experiments in Section 5, we use a sigmoid function rather than a convex function to model pessimism, to decrease probabilities of good outcomes and increase probabilities of unfavorable ones.

Remember that consequences $x_{i}$ in $l$ are ordered according to the value of $u(x_{i})$ , such that consequences with a small score outputed by the utility function $u$ are placed at the beginning of the lottery $l$ . The intuition behind Equation 1 is then the following: With probability $p=1$ , by making the decision $d$ , you are sure to have at least the score of the worst consequence $x_{1}$ , i.e., $u(x_{1})$ . Then, with (deformed) probability $\phi(p_{2}+\ldots+p_{n})$ , you can have the score $u(x_{1})$ plus a gain equals to $\big{(}u(x_{2})-u(x_{1})\big{)}$ . With probability $\phi(p_{3}+\ldots+p_{n})$ , you can have an additional gain equals to $\big{(}u(x_{3})-u(x_{2})\big{)}$ , and so on until having an additional gain equals to $\big{(}u(x_{n})-u(x_{n-1})\big{)}$ with probability $\phi(p_{n})$ . The obtained value depends on the order, or rank, of the value of the utility function applied to consequences, justifying the name “Rank Dependent Utility”.

However, defining a utility function $u$ over the consequence space it not easy, even for numerical-only consequences. This space is completely dependent on the problem and even on the problem instance so it is not realistic to propose general-purpose utility functions that could work and certify a behavior on any kind of decision-making problem. This is however possible with the probability deformation function $\phi$ since it is always a function from $[0,1]$ to $[0,1]$ .

Our decision-making problems being modeled as optimization problems, a consequence $x$ of a decision $d$ is the value of our objective function. Therefore, the relation $\succeq_{\mathcal{X}}$ is merely the relation $\geq$ over real numbers. This implies that $u$ is a function from $\mathbb{R}$ to $\mathbb{R}$ . In this work, we consider $u$ to be the identity function $id(x)=x$ and will use generic probability deformation functions $\phi$ to change an agent’s behavior regarding risks.

4 Main contribution

The main difficulty to tackle a combinatorial optimization problem under uncertainty via Constraint Programming is the lack of reliable criterion to attribute a quality to each possible solution. How do you rank solutions if they lead to different objective function values regarding possible values of stochastic variables?

The main contribution of this paper is proposing to inject the Rank Dependent Utility from decision theory into the classical COP formalism to solve optimization problems under uncertainty.

We consider decision-making problem where one knows what our variables are, what values they can take (i.e., we know the domain of each variable), what values combinations are possible or not (i.e., we know our constraints), but where we have an objective function to optimize implicating stochastic variables for which values are unknown at the moment we must take a decision, such that only a third-party (the environment, an independent agent, etc) has the power to set the value of these variables.

This describes in fact most common decision-making situations: when we have to take a decision, we often miss some pieces of information (we cannot have a perfect knowledge about everything) that still have an impact on the quality of our decision. Should I invest my money in stocks or bitcoins? We do not know if the price will climb up or fall down, but we know however what we can or cannot do (how much money can we invest for instance). The quality of our decision will be only revealed once stochastic variables values will be known.

4.1 Injecting RDU into COP

We recall we are interested in modeling uncertainty in decision-making problems. These problems without uncertainty can be modeled through the COP formalism. For many cases, uncertainty in decision-making problems does not affect what you can or cannot do but on external unknown elements that have a direct impact on the decision quality.

An easy way to model such decision-making problems is to model it through the regular COP formalism, by defining 1. a set of decision variables, i.e., regular variables which the solver has the control on, 2. a set of stochastic variables, representing all unknown pieces of information, 3. a domain for both decision and stochastic variables, 4. a probability distribution for the domain of each stochastic variable, 5. a set of constraints upon decision variables and 6. an objective function mixing decision and stochastic variables. If probability distributions of stochastic variables are unknown, we can approximate them with statistics. A convenient point with games is that we can often simulate their environment or analyze replays and then collect those statistics fairly easily.

Like Equation 1 suggests, we need to know all consequences of a decision to compute RDU. This is of course intractable since we have $|D|^{|S|}$ consequences for each decision, with $|S|$ the number of stochastic variables and $|D|$ the cardinality of their domain. A convenient way to approximate RDU is to do Monte Carlo sampling of stochastic variable values, following their probability distribution.

We can now apply our objective function to compute the RDU and get a usable metric under uncertainty, which allows us to rank solutions and guide our decisions.

Let’s consider a problem modeled by a COP with decision variables $v_{i}$ , stochastic variables $s_{j}$ and an objective function $f$ . Like described in Subsection 3.2, consequences $x_{i}$ of a decision $d$ corresponds to values output by $f$ . In the context of a COP, a decision $d$ is a vector of values assigned to each decision variables $v_{i}$ . Using Algorithm 1, we can compute the relation $\succeq_{\mathcal{D}}$ among decisions by approximating the RDU of their respective lottery with Monte Carlo samplings.

We give details about Algorithm 1 here. It takes as input a solution $d$ (or a decision), i.e., a vector of values in $D$ for each decision variables of the problem. The algorithm outputs a real number, a preference on the decision, i.e., its estimated RDU, giving us the opportunity to compare it with other decisions. Line 1 initializes a vector $x$ to save $k$ values of the objective function $f$ , each value computed with a different sampling of stochastic variables. From Line 2 to Line 5, we sample stochastic variable values following their probability distribution, computes the values of $f$ regarding $d$ and sampled values, and store them into the vector $x$ . This vector is sorted in Line 6. Lines 7 and 8 compute an approximation of the RDU applying Equation 1 and return this value.

In our experiments next section, we draw $k=50$ samples. In Algorithm 1, $k$ being a parameter and not an input, we do not take it into count to compute the algorithm complexity. The complexity of Algorithm 1 is then in $\Theta(f)$ , depending on the complexity of the objective function $f$ only. Sampling $m$ stochastic variables is also outside the scope of the complexity of Algorithm 1 since stochastic variables are not among its inputs.

5 Proof of concept

*We give a proof of concept of our contribution to show it is implementable and use it to solve a decision-making problem under uncertainty in a RTS game. We have included this decision-making solving system into a bot playing to the game * $\mathrm{\SIUnitSymbolMicro}$ *RTS. Our bot, named POAdaptive, has won the partially observable track of the 2018 * $\mathrm{\SIUnitSymbolMicro}$ RTS AI competition organized within the CIG 2018 conference. The code of our bot, our experimental setup and our experimental results can be found in the following github repository: github.com/richoux/microrts-uncertainty/tree/v1.0.

We will consider the following problem: RTS game propose to train units which often follow a rock-paper-scissors scheme. Because of the fog of war, we do not perfectly know the enemy army composition and we must infer his or her strategy from some partial observations. We must constantly take a production decision answering this question: “What next units should I produce to counter my enemy strategy?”

5.1 $\mathrm{\SIUnitSymbolMicro}$ RTS

*We decided to use * $\mathrm{\SIUnitSymbolMicro}$ RTS has an experimental environment. $\mathrm{\SIUnitSymbolMicro}$ RTS is an open-source, minimalist real-time strategy game developed by Santiago Ontañón for research purpose **[9]**.

*The game is made upon classical RTS mechanisms: there are resources (or money) to gather (green squares in Figure 1). This money allow us to build buildings and train units. In * $\mathrm{\SIUnitSymbolMicro}$ *RTS, there are two kind of buildings: bases (white squares) where money is stocked and barracks (grey squares) where army units are produced. Four units are available in * $\mathrm{\SIUnitSymbolMicro}$ RTS: workers (small grey circles), light units (orange circles, not appearing in Figure 1), ranged units (blue circles) and heavy units (large yellow circles). Workers are weak against all units but are the only ones able to gather resources and build buildings. Light, ranged and heavy units are following a rock-paper-scissors scheme, in the sense that heavy units are strong against light units, light units are strong against range units and range units are strong against heavy units.

To win, a player must destroy all enemy units and buildings. If nobody reaches that goal before a fixed number of frames, the game ends in a draw.

$\mathrm{\SIUnitSymbolMicro}$ *RTS supports both complete and partially observable games. In order to test our method solving decision-making problem under uncertainty, we used * $\mathrm{\SIUnitSymbolMicro}$ RTS exclusively in partially observable mode.

5.2 Deciding about unit production

We propose here a model of our production problem through the regular COP formalism. Let’s consider $\{H,L,R\}$ the heavy, light and ranged type of units, respectively. We have:

•

Two kind of decision variables: $plan_{X}$ , with $X\in\{H,L,R\}$ , representing the total number of units of type $X$ we should have (i.e., the total number of units we currently possess plus the number of units we plan to produce), and $assign_{XY}$ , $\forall X,Y\in\{H,L,R\}$ , the number of our units of type $X$ we plan to use to counter enemy units of type $Y$ .

•

One kind of stochastic variables: $enemyUnits_{X}$ , with $X\in\{H,L,R\}$ , representing the total number of units of type $X$ the enemy currently possesses.

•

*Domains for each variable are natural numbers from 0 to a threshold. We used 20 as a threshold in our experiments, which is sufficient for small maps in * $\mathrm{\SIUnitSymbolMicro}$ RTS.

•

Two kind of constraints:

$assign_{HL}+assign_{HR}+assign_{HH}=plan_{H}$

$assign_{RL}+assign_{RR}+assign_{RH}=plan_{R}$

$assign_{LL}+assign_{LR}+assign_{LH}=plan_{L}$

$3(plan_{H}-ourUnits_{H})+2(plan_{R}-ourUnits_{R})+2(plan_{L}-ourUnits_{L})\leq stockResource$

The first three constraints create the bridge between the total number of units of type $X$ we aim to have and the number of units of type $X$ we consider we need to counter an unknown number of enemy units of type $Y$ . The last constraint is the resource balance constraint: given $ourUnits_{X}$ the number of units of type $X$ we currently have, ( $plan_{X}-ourUnits_{X}$ ) corresponds to the number of units $X$ we have to produce. A heavy unit costs 3 resource points, a light and ranged units only 2. The parameter stockResources corresponds to the current resource points we possess at the moment of we must decide about our production.

•

The objective function $\max target_{H}+target_{L}+target_{R}$ with

[TABLE]

where $X\in\{H,L,R\}$ and coefficient of $AB$ -type (i.e., $HH$ , $HL$ , $\ldots$ , $RR$ ) are constants representing how many units of type $A$ we need to counter a unit of type $B$ . The min function for $target_{X}$ is to avoid a mere sum of the expressions $HX*assign_{HX}+RX*assign_{RX}+LX*assign_{LX}-enemyUnits_{X}$ for the three possible $X\in\{H,L,R\}$ , otherwise it would lead to simply produce the unit with the highest $AB$ -type coefficient. We take the minimum between these expressions and the value 1 to allow to produce up to one more unit than necessary.

Our $AB$ -type coefficients have been estimated by running 200 games of 10 units of $A$ against 10 units of $B$ , for each combination of $AB\in\{H,L,R\}^{2}$ . We then took the ratio of the total numbers of surviving units over 200 simulations. For instance, after 200 games of type “10 heavy versus 10 light”, we had 1284 surviving heavy units and 480 surviving light units. Then our parameter $HL$ is equals to $\frac{480}{1284}=0.3738$ (i.e., we need 0.3738 heavy unit to deal against 1 light unit) and $LH$ is equals to $\frac{1284}{480}=2.675$ .

*Finally, statistics on $enemyUnits_{X}$ stochastic variables have been made by analyzing 800 replays of * $\mathrm{\SIUnitSymbolMicro}$ RTS games from 2017 competitions. For each frame and each unit type, we counted these units occurrence. These statistics are sharpen by observations while playing a game: if we observe for instance 3 enemy light units at the same moment, we nullify probabilities that the enemy has 0, 1 or 2 light units only, and we normalize remaining probabilities.

5.3 Experiments

*Few * $\mathrm{\SIUnitSymbolMicro}$ RTS bots have been developed for partially observable games, and most of them are in fact scripted bots. We have taken a basic rush bot and only modify its production behavior, giving our bot POAdaptive. We did not modify its initial build order (produce no additional workers, start immediately a barracks with our unique worker and then gather resources until the end of the game). We only add a quick hit-and-run behavior for our ranged units and a light seek-and-destroy behavior.

Our bot adapts its production in function of the RDU preference computed according to the objective function. We give our pure COP model to the GHOST solver **[13*]**, a solver dealing with classical CSP or COP models and unable to handle uncertainty directly. This shows that our way to inject decision theory into the classical COP formalism is sufficient to handle uncertainty and do not required to develop a new formalism neither dedicated solvers. We give to the solver 100 milliseconds per frame as computation budget to solve our COP problem, to be consistent with * $\mathrm{\SIUnitSymbolMicro}$ RTS competitions rules.

*With our bot POAdaptive, we won the partially observable track of the 2018 * $\mathrm{\SIUnitSymbolMicro}$ *RTS AI competition, over 7 competitors. We did not tweak our bot before the competition to be efficient on the competition maps neither against specific bots (such as 2017 competitors). This show that our decision-making solving is efficient enough to beat scripted rush bots as well as MCTS-based and Hierarchical Task Network-based bots. Figure 2 shows final scores of the 2018 * $\mathrm{\SIUnitSymbolMicro}$ RTS AI competition, over 720 games for each bot within a round-robin tournament over 12 maps (4 of them were kept secret before the beginning of the competition). The cumulative score is the sum of the score result for each game, i.e., 1 for a win, 0.5 for a tie and 0 for a loss. The scores are normalized by dividing the cumulative scores by the number of games per bots, i.e., 720.

To evaluate our decision-making process, we run 100 games (50 starting at the North-East position, 50 starting at the South-West position) between the second best bot of the competition, POLightRush bot, and four methods: POAdaptive using RDU with a pessimistic $\phi$ function, POAdaptive using RDU with an optimistic $\phi$ function, POAdaptive using Expected Utility instead of RDU (this can be easily done by using RDU with $\phi$ as the identity function), and finally a baseline bot having exactly the same behavior as POAdaptive except for the unit production decision, taken randomly among the three military units. The pessimistic function we use is the logistic function $\phi(p)=\frac{1}{1+exp(-\lambda*(2*p-shift))}$ where $p$ is the probability and with parameters $\lambda=10$ and $shift=1.3$ . The optimistic function is the logit function $\phi(p)=1+\frac{log(\frac{p}{2-p})}{\lambda}$ with $\lambda=10$ .

We run experiments on three small basic maps 8x8, 12x12 and 16x16 as well as on three large basic maps 24x24, 32x32 and 64x64. Results on small maps are shown in Table 1, and normalized scores are illustrated by Figure 3. These results are more representative of our decision-making system performances on small maps. Indeed on larger maps, POAdaptive has too few occasions to meet enemy units since no scouting behavior has been written (we know by experience that coding a proper scouting behavior would not be so trivial). Thus, behavior of these four methods tends to be the same. POAdaptive’s adaptation skills give it a slight advantage and then slight better scores than the baseline method. On these larger maps, RDU with the optimistic $\phi$ gives the best results (this optimistic version was besides used for the CIG 2018 competition), slightly better than the pessimistic version. Results on large maps are shown in Table 2 and Figure 4.

On small maps, Table 1 and Figure 3 show that both RDU versions outperformed the EU version, itself outperforming our baseline. Unlike for large maps, the pessimistic version gives slightly better results than the optimistic version. This can be explained by the fact that small maps do not give you a lot of time to react when you spot an unfavorable enemy army composition. Being already prepare to the worst helps in that case.

6 Related works

Uncertainty has been intensively studied for the last 30 years in fields dealing with combinatorial optimization such as Operational Research **[2]**, but significantly less works have been done for optimization under uncertainty in Constraint Programming **[5]. To the best of our knowledge, all methods to solve optimization problems through Stochastic Constraint Programming use a formalism considering chance-constraints, usually handled by scenario-based methods.[1]** is a recent example where the authors inject a probabilistic inference engine from the graphical model community into a classical solver to solve Stochastic CSP instances, thus dealing with chance-constraints.

There are also few works in RTS Game AI using Constraint Programming techniques, in particular through Constraint Satisfaction/Optimization Problems, and few of them dealing with uncertainty. However, we can cite works of **[6, 7, 8]** where authors use Stochastic CSP to make a bot participating to the General Game Playing competition. Their bot has won the 2016 competition.

Although the following papers do not deal with uncertainty, they all focus on solving optimization problems in RTS games, in particular StarCraft. Thus **[14]** propose to model with COP the optimal building placement to make a wall at a base entrance in order to make easier its defense. **[4, 13]** propose a CSP and COP solver, GHOST, that we used for our experiments. Their Constraint Programming solver as been designed to output good quality solution within some tenth of milliseconds, make it usable in RTS games.

Beyond Constraint Programming but close enough, **[3]** use a branch and bound algorithms to optimize build order in the RTS game StarCraft. Like **[14]**, **[16]** tackle the problem to optimize a wall-in building placement in StarCraft but through the prism of Answer-Set Programming.

7 Conclusion

In this paper, we proposed a way to deal with combinatorial optimization problems under uncertainty within the classical Constrained Optimization Problems formalism by injecting the Rank Dependent Utility from decision theory. The difficulty for Constraint Programming formalisms of handling both optimization and uncertainty at the same time was due to the impossibility to rank solutions if they lead to different objective function values regarding possible values of stochastic variables.

We get around this difficulty by computing preferences over decisions with the Rank Dependent Utility using our objective function to score decisions. This allow us to show it is possible to handle uncertainty with regular Constraint Programming solvers, without having to define a new formalism neither to develop dedicated solvers for uncertainty. This brings new perspective to tackle uncertainty in combinatorial optimization problems that where considered so far to be intractable.

*To show our result is usable in practice, we propose a proof of concept of our result by modeling a decision-making problem under uncertainty in the * $\mathrm{\SIUnitSymbolMicro}$ *RTS game via the classical COP formalism, and we solve it using a regular COP solver. We thus tackle a production unit problem and implement a bot playing partially observable * $\mathrm{\SIUnitSymbolMicro}$ *RTS games and deciding what units to produce in order to maximize its chance to counter its opponent strategy. Our bot has won the partially observable track of the 2018 * $\mathrm{\SIUnitSymbolMicro}$ RTS AI competition and outperforms equivalent bots based on Expected utility or randomly producing units.

Our result only concern short-horizon decision-making problems. We could adapt it to take into account larger horizons of action planning and integrate it into a bot taking long-term strategy decision under uncertainty. We also would like to investigate problems where constraints contain stochastic variables. Finally, it would be interesting to implement our result into a bot playing a more ambitious game such as StarCraft.

Acknowledgment

This research was supported by the Pays de la Loire region through the Atlanstic 2020 research grant COPUL.

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Babaki, T. Guns, and L. de Raedt. Stochastic Constraint Programming with And-Or Branch-and-Bound. In IJCAI’17 , pages 539–545, 2017.
2[2] J. R. Birge and F. Louveaux. Introduction to Stochastic Programming . Springer Publishing Company, Incorporated, 2nd edition, 2011.
3[3] D. Churchill and M. Buro. Build order optimization in Star Craft. In AIIDE’11 , pages 14–19, 2011.
4[4] J. Fradin and F. Richoux. Robustness and Flexibility of GHOST. In the workshop RTS of AIIDE’15 , pages 9–14, 2015.
5[5] B. Hnich, R. Rossi, S. A. Tarim, and S. Prestwich. A Survey on CP-AI-OR Hybrids for Decision Making Under Uncertainty In CP-AI-OR’11 , pages 227–270, 2011.
6[6] F. Koriche, S. Lagrue, É. Piette, and S. Tabary. General game playing with Stochastic CSP. Constraints , 21(1):95–114, 2016.
7[7] F. Koriche, S. Lagrue, E. Piette, and S. Tabary. Stochastic Constraint Programming for General Game Playing with Imperfect Information. In GIGA’16 , 2016.
8[8] F. Koriche, S. Lagrue, É. Piette, and S. Tabary. Constraint-Based Symmetry Detection in General Game Playing. In IJCAI’17 , pages 280–287, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Constrained optimization under uncertainty

Abstract

1 Introduction

2 Motivation

3 Preliminaries

3.1 Constraint Programming

3.2 Decision Theory

4 Main contribution

4.1 Injecting RDU into COP

5 Proof of concept

5.1 \SIUnitSymbolMicro\mathrm{\SIUnitSymbolMicro}\SIUnitSymbolMicroRTS

5.2 Deciding about unit production

5.3 Experiments

6 Related works

7 Conclusion

Acknowledgment

5.1 $\mathrm{\SIUnitSymbolMicro}$ RTS