Exploring optimal institutional incentives for public cooperation

Shengxian Wang; Xiaojie Chen; and Attila Szolnoki

arXiv:1907.10989·physics.soc-ph·September 4, 2019

Exploring optimal institutional incentives for public cooperation

Shengxian Wang, Xiaojie Chen, and Attila Szolnoki

PDF

TL;DR

This paper uses optimal control theory to identify cost-effective institutional incentives, both rewards and punishments, that promote public cooperation efficiently in well-mixed populations.

Contribution

It formulates and solves two optimal control problems to determine minimal-cost incentive strategies for fostering cooperation, a novel approach in this context.

Findings

01

Optimal punishment is more cost-effective than reward for achieving cooperation.

02

The derived strategies minimize cumulative costs while ensuring desired cooperation levels.

03

Numerical examples confirm the theoretical results and effectiveness of the strategies.

Abstract

Prosocial incentive can promote cooperation, but providing incentive is costly. Institutions in human society may prefer to use an incentive strategy which is able to promote cooperation at a reasonable cost. However, thus far few works have explored the optimal institutional incentives which minimize related cost for the benefit of public cooperation. In this work, in combination with optimal control theory we thus formulate two optimal control problems to explore the optimal incentive strategies for institutional reward and punishment respectively. By using the approach of Hamilton-Jacobi-Bellman equation for well-mixed populations, we theoretically obtain the optimal positive and negative incentive strategies with the minimal cumulative cost respectively. Additionally, we provide numerical examples to verify that the obtained optimal incentives allow the dynamical system to reach the…

Figures3

Click any figure to enlarge with its caption.

Equations91

Π_{C} = \frac{r c ( n _{C} + 1 )}{n} - c + \frac{an u}{n _{C} + 1},

Π_{C} = \frac{r c ( n _{C} + 1 )}{n} - c + \frac{an u}{n _{C} + 1},

Π_{D} = \frac{r c n _{C}}{n} .

Π_{D} = \frac{r c n _{C}}{n} .

Π_{D} = \frac{r c n _{C}}{n} - \frac{bn u}{n _{D} + 1},

Π_{D} = \frac{r c n _{C}}{n} - \frac{bn u}{n _{D} + 1},

Π_{C} = \frac{r c ( n _{C} + 1 )}{n} - c .

Π_{C} = \frac{r c ( n _{C} + 1 )}{n} - c .

\overset{x}{˙} = x (1 - x) (p_{C} - p_{D}),

\overset{x}{˙} = x (1 - x) (p_{C} - p_{D}),

p_{i} = n_{C} = 0 \sum n - 1 (n _{C} n - 1) x^{n_{C}} (1 - x)^{n_{D}} Π_{i}, (i = C or D) .

p_{i} = n_{C} = 0 \sum n - 1 (n _{C} n - 1) x^{n_{C}} (1 - x)^{n_{D}} Π_{i}, (i = C or D) .

p_{C} = \frac{r c [ 1 + ( n - 1 ) x ]}{n} - c + u \frac{1 - ( 1 - x ) ^{n}}{x},

p_{C} = \frac{r c [ 1 + ( n - 1 ) x ]}{n} - c + u \frac{1 - ( 1 - x ) ^{n}}{x},

p_{D} = \frac{r c ( n - 1 ) x}{n} .

p_{D} = \frac{r c ( n - 1 ) x}{n} .

\overset{x}{˙} = x (1 - x) [u \frac{1 - ( 1 - x ) ^{n}}{x} - \frac{( n - r ) c}{n}] .

\overset{x}{˙} = x (1 - x) [u \frac{1 - ( 1 - x ) ^{n}}{x} - \frac{( n - r ) c}{n}] .

p_{C} = \frac{r c [ 1 + ( n - 1 ) x ]}{n} - c,

p_{C} = \frac{r c [ 1 + ( n - 1 ) x ]}{n} - c,

p_{D} = \frac{r c x ( n - 1 )}{n} - u \frac{1 - x ^{n}}{1 - x} .

p_{D} = \frac{r c x ( n - 1 )}{n} - u \frac{1 - x ^{n}}{1 - x} .

\overset{x}{˙} = x (1 - x) [u \frac{1 - x ^{n}}{1 - x} - \frac{( n - r ) c}{n}] .

\overset{x}{˙} = x (1 - x) [u \frac{1 - x ^{n}}{1 - x} - \frac{( n - r ) c}{n}] .

J = \int_{t_{0}}^{t_{f}} \frac{( n u ) ^{2}}{2} d t,

J = \int_{t_{0}}^{t_{f}} \frac{( n u ) ^{2}}{2} d t,

\begin{split}&\min\,\,J=\int^{t_{f}}_{0}\frac{(nu)^{2}}{2}dt,\\ &{\rm s.t.}\quad\left\{\begin{array}[]{lc}\dot{x}=x(1-x)[u\frac{1-(1-x)^{n}}{x}-\frac{(n-r)c}{n}],\\ x(0)=x_{0},\\ x(t_{f})=1-\delta.\end{array}\right.\end{split}

\begin{split}&\min\,\,J=\int^{t_{f}}_{0}\frac{(nu)^{2}}{2}dt,\\ &{\rm s.t.}\quad\left\{\begin{array}[]{lc}\dot{x}=x(1-x)[u\frac{1-(1-x)^{n}}{x}-\frac{(n-r)c}{n}],\\ x(0)=x_{0},\\ x(t_{f})=1-\delta.\end{array}\right.\end{split}

\begin{split}&\min\,\,J=\int^{t_{f}}_{0}\frac{(nu)^{2}}{2}dt,\\ &{\rm s.t.}\quad\left\{\begin{array}[]{lc}\dot{x}=x(1-x)[u\frac{1-x^{n}}{1-x}-\frac{(n-r)c}{n}],\\ x(0)=x_{0},\\ x(t_{f})=1-\delta.\end{array}\right.\end{split}

\begin{split}&\min\,\,J=\int^{t_{f}}_{0}\frac{(nu)^{2}}{2}dt,\\ &{\rm s.t.}\quad\left\{\begin{array}[]{lc}\dot{x}=x(1-x)[u\frac{1-x^{n}}{1-x}-\frac{(n-r)c}{n}],\\ x(0)=x_{0},\\ x(t_{f})=1-\delta.\end{array}\right.\end{split}

u_{R}^{*} (t) = \frac{2 ( n - r ) c}{n} \frac{x}{1 - ( 1 - x ) ^{n}} .

u_{R}^{*} (t) = \frac{2 ( n - r ) c}{n} \frac{x}{1 - ( 1 - x ) ^{n}} .

\overset{x}{˙} = \frac{( n - r ) c}{n} x (1 - x),

\overset{x}{˙} = \frac{( n - r ) c}{n} x (1 - x),

x (t) = \frac{1}{1 + ( \frac{1}{x _{0}} - 1 ) e ^{- \frac{( n - r ) c}{n} t}} .

x (t) = \frac{1}{1 + ( \frac{1}{x _{0}} - 1 ) e ^{- \frac{( n - r ) c}{n} t}} .

u_{P}^{*} (t) = \frac{2 ( n - r ) c}{n} \frac{( 1 - x )}{1 - x ^{n}} .

u_{P}^{*} (t) = \frac{2 ( n - r ) c}{n} \frac{( 1 - x )}{1 - x ^{n}} .

\overset{x}{˙} = \frac{( n - r ) c}{n} x (1 - x),

\overset{x}{˙} = \frac{( n - r ) c}{n} x (1 - x),

x (t) = \frac{1}{1 + ( \frac{1}{x _{0}} - 1 ) e ^{- \frac{( n - r ) c}{n} t}} .

x (t) = \frac{1}{1 + ( \frac{1}{x _{0}} - 1 ) e ^{- \frac{( n - r ) c}{n} t}} .

f (x, u, t) = x (1 - x) [u \frac{1 - ( 1 - x ) ^{n}}{x} - \frac{( n - r ) c}{n}] .

f (x, u, t) = x (1 - x) [u \frac{1 - ( 1 - x ) ^{n}}{x} - \frac{( n - r ) c}{n}] .

H_{R} (x, u, t)

H_{R} (x, u, t)

= \frac{( n u ) ^{2}}{2} + x (1 - x)

[u \frac{1 - ( 1 - x ) ^{n}}{x} - \frac{( n - r ) c}{n}] \frac{\partial J ^{*}}{\partial x},

u_{R}^{*} (t) = - \frac{( 1 - x ) [ 1 - ( 1 - x ) ^{n} ]}{n ^{2}} \frac{\partial J ^{*}}{\partial x} .

u_{R}^{*} (t) = - \frac{( 1 - x ) [ 1 - ( 1 - x ) ^{n} ]}{n ^{2}} \frac{\partial J ^{*}}{\partial x} .

- \frac{\partial J ^{*}}{\partial t} = H_{R} [x, u_{R}^{*} (t), t] .

- \frac{\partial J ^{*}}{\partial t} = H_{R} [x, u_{R}^{*} (t), t] .

- \frac{\partial J ^{*}}{\partial t}

- \frac{\partial J ^{*}}{\partial t}

- \frac{( n - r ) c}{n} x (1 - x) \frac{\partial J ^{*}}{\partial x} .

\frac{\partial J ^{*}}{\partial t} = 0.

\frac{\partial J ^{*}}{\partial t} = 0.

\begin{array}[]{l}\frac{\partial J^{*}}{\partial x}=0\;\;{\rm or}\;\;\frac{\partial J^{*}}{\partial x}=-\frac{2n(n-r)cx}{(1-x)[1-(1-x)^{n}]^{2}}.\end{array}

\begin{array}[]{l}\frac{\partial J^{*}}{\partial x}=0\;\;{\rm or}\;\;\frac{\partial J^{*}}{\partial x}=-\frac{2n(n-r)cx}{(1-x)[1-(1-x)^{n}]^{2}}.\end{array}

\frac{\partial J ^{*}}{\partial x} < 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Exploring optimal institutional incentives for public cooperation

Shengxian Wang

Xiaojie Chen

[email protected]

Attila Szolnoki

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China

Institute of Technical Physics and Materials Science, Centre for Energy Research, Hungarian Academy of Sciences, P.O. Box 49, H-1525 Budapest, Hungary

Abstract

Prosocial incentive can promote cooperation, but providing incentive is costly. Institutions in human society may prefer to use an incentive strategy which is able to promote cooperation at a reasonable cost. However, thus far few works have explored the optimal institutional incentives which minimize related cost for the benefit of public cooperation. In this work, in combination with optimal control theory we thus formulate two optimal control problems to explore the optimal incentive strategies for institutional reward and punishment respectively. By using the approach of Hamilton-Jacobi-Bellman equation for well-mixed populations, we theoretically obtain the optimal positive and negative incentive strategies with the minimal cumulative cost respectively. Additionally, we provide numerical examples to verify that the obtained optimal incentives allow the dynamical system to reach the desired destination at the lowest cumulative cost in comparison with other given incentive strategies. Furthermore, we find that the optimal punishing strategy is a cheaper way for obtaining an expected cooperation level when it is compared with the optimal rewarding strategy.

keywords:

Evolutionary game dynamics, Evolution of cooperation, Control theory, Public goods game

1 Introduction

Cooperation is of vital importance in the contemporary era and identified as an essential principle of evolution [1, 2, 3]. However, self-regarding individuals constantly strive to maximize their own personal benefit which would inevitably lead to the collapse of cooperation, in the scenario where what is best for the collectivity is at odds with what is best for an individual. This is referred to as social dilemma of cooperation [4, 5, 6, 7], which has received considerable attention in recent years [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]. To overcome this conflict social institutions frequently apply two control mechanisms, that is, rewards (positive incentives) for cooperation [29, 30, 31, 32, 33, 34, 35, 36, 37, 38] and punishments (negative incentives) for defection [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49].

In theory, cooperative behavior will occur as long as the amount of incentives outweighs the payoff difference between cooperators and defectors no matter whether incentive is positive or negative. However, executing an incentive strategy into the system is costly and the cumulative costs for different incentive strategies could be significantly different. This observation has inspired several studies to explore efficient incentive strategies for the evolution of cooperation at a low cost [32, 34, 35, 36]. Recently, Sasaki et al. have introduced institutional positive and negative incentives into the public goods dilemma respectively and found that social dilemma of cooperation can be overcome by both institutional punishment and reward. More interestingly, they have found that institutional punishment can overcome social dilemma at a much lower cost with the help of optimal participation [35]. Subsequently, Chen et al. have proposed an optimal incentive strategy in combination with institutional punishment and reward, called “first carrot, then stick”, which is able to unexpectedly succeed in promoting cooperation. Furthermore they also demonstrated that the “first carrot, then stick” strategy makes full cooperation established and recovered at lower cost under wider range of conditions than either reward or punishment alone no matter well-mixed or spatially structured populations are considered [34]. Nevertheless, it is worth pointing out that these previous works fixed the per capita incentive, which is an important parameter determining the cumulative cost during the evolutionary process, and they focused on the dominant low-cost incentive strategy. In other words, they did not explore the optimal incentive strategy with the minimal cumulative cost for institutional punishment or reward, and thus it remained unclear the specific forms of optimal incentive strategies with the minimal cost for the evolution of cooperation when the per capita incentive is considered as a control variable.

Recently, optimal control theory [50, 51, 52] has been introduced to study evolutionary game dynamics [53, 54, 55], which seems to be an alternative approach for studying the optimal incentive strategy with the minimal cost for the evolution of cooperation. For example Griffin and Belmonte have studied the problem of designing an optimal success tax in a three-species public goods game and formulated the problem as an optimal control problem [53]. Furthermore Bravetti and Padilla have introduced an extension of the replicator equation, called the optimal replicator equation, by using optimal control theory [54]. Additionally Wang et al. have studied the problem of designing reward and compensation control strategies for promoting the evolution of cooperation in the prisoner’s dilemma game [55]. However, it is worth mentioning that these previous works do not involve the investigation of the optimal incentive strategies with the minimal cumulative cost for the evolution of cooperation, and thus it could be interesting to study the optimal strategies of institutional incentives by using optimal control theory.

In this work, we thus consider the per capita incentive as the control variable in the replicator equation for the public goods game with institutional punishment or reward and aim to explore the optimal control protocols for the control variable in well-mixed populations. Based on optimal control theory, we set a cost function as the objective functional and formulate two optimal control problems for institutional reward and punishment respectively. By using the approach of Hamilton-Jacobin-Bellman equation (HJB equation) [50, 51, 52], we theoretically obtain the specific optimal control protocols for institutional punishment and reward respectively such that the set objective functional is minimized, i.e., the cumulative cost amount is minimized. We find that the optimal control incentive strategy with the minimal cumulative cost can make the system converge to the homogeneous state of full cooperation. In addition, we provide numerical examples to respectively confirm that the amount of cumulative cost for the optimal control strategy of positive (negative) incentive is lowest when comparing with other given protocols of positive (negative) incentives. We further find that the amount of cumulative cost for the obtained optimal negative incentive strategy is lower than that for the obtained optimal positive incentive strategy in almost all the initial conditions.

2 Model and methods

2.1 Public goods game with dynamic incentives

Our model is based on the public goods game (PGG) in an infinitely large, well-mixed population, in which each individual can choose to cooperate or defect. In the game, individuals are randomly selected from the population to form a $n$ -player group with $n\geq 2$ . In the formed group, each cooperator contributes the fixed amount of investment $c>0$ to the common pool, while a defector contributes nothing. The total investments to the pool are then multiplied by a synergy factor $r>1$ and allocated equally among all the $n$ group members. The social dilemma arises when $r<n$ and nobody prefers to cooperate if no external incentive mechanism is considered [35].

The basic model can be extended by institutional incentives [34, 35]. We assume that apart from the payoffs originated from the PGG, each group receives the total incentives stipulated by an institution in the form $nu$ to be used either for rewarding cooperators or for punishing defectors, where $u\;(u>0)$ is the per capita incentive. If the total incentives are used for rewarding, each cooperator obtains a reward $anu/(n_{C}+1)$ , where $n_{C}$ denotes the number of cooperators among the other $n-1$ players and the factor $a$ denotes the leverage of rewarding [34]. Accordingly, a cooperator receives the payoff

[TABLE]

while a defector in the same group receives

[TABLE]

If the total incentives are used for punishing defectors, individuals who defect have their payoff reduced by $bnu/(n_{D}+1)$ , where $b$ characterizes the leverage of punishment [34] and $n_{D}$ denotes the number of defectors among the other $n-1$ group members. Accordingly, a defector receives

[TABLE]

payoff, while a cooperator in the same group receives

[TABLE]

In the present work we consider rewarding and punishing in isolation and assume that the total incentives can be only used for rewarding or punishing like Ref. [35]. But differently, we assume that the per capita incentive is not fixed, and can be adjusted during the evolutionary process. We then aim to explore the optimal control protocols of $u$ for institutional reward and punishment respectively. For the sake of proper comparison, we assume that $a=b=1$ in this work.

2.2 Replicator equation

We then use the replicator equation to study how the frequencies of different strategies alter in infinitely large, well-mixed populations. Here, we suppose a large population, a fraction $x$ of which encompasses cooperators, and the remaining fraction $1-x$ being defectors. In consequence, the system of replicator equation can be written as [56]

[TABLE]

where the average payoffs of cooperators and defectors can be respectively given as

[TABLE]

When institutional reward is used, the average payoffs of cooperators and defectors can be respectively written as [31, 35]

[TABLE]

and

[TABLE]

Accordingly, the replicator equation becomes

[TABLE]

Whereas when institutional punishment is used, the average payoffs of cooperators and defectors can be respectively written as [31, 35]

[TABLE]

and

[TABLE]

Accordingly, the replicator equation becomes

[TABLE]

2.3 Optimal control problems of incentive strategies

In theory, cooperative behavior will occur as long as the amount of incentives outweights the payoff difference between cooperators and defectors, no matter whether incentive is positive or negative. But implementing incentives is costly and thus we aim to explore the optimal control protocols of $u$ with the minimal amount of cumulative cost for implementing incentives. To do that, we first define a cost function as the objective functional, which is given as

[TABLE]

where $t_{0}$ is the initial time and $t_{f}$ is the terminal time for the system. In our work, the initial time is $t_{0}=0$ . Based on the above definition, we can then explore the optimal incentive strategies during the evolutionary period between [math] and $t_{f}$ using optimal control theory [50, 51, 52]. We consider that $t_{f}$ is not fixed, but the fraction of cooperators at $t_{f}$ is fixed at $x(t_{f})$ . Considering the promotion effects of institutional incentives on cooperation, we further assume that $x(t_{f})=1-\delta>x_{0}$ , where $\delta$ is a parameter determining the cooperation level at the terminal time satisfying $0\leq\delta<1$ , and $x_{0}$ is the initial cooperation level in the population.

Based on the above description, we now formulate the two following optimal control problems for institutional reward and punishment respectively. For institutional reward, we have

[TABLE]

Analogously, for institutional punishment we have

[TABLE]

Here the cost function $J$ characterizes the cumulative cost of one interaction group on average during the period $[0,t_{f}]$ for the dynamical system described by the replicator equation starting from the initial state $x_{0}$ to the terminal state $1-\delta$ . Thus the quantity $\min J$ can work as the performance index of calculating the optimal control protocol of $u$ with the minimal cumulative cost in a well-mixed population. In what follows, we will use optimal control theory, especially the approach of HJB equation [50, 51, 52], to solve the two optimal control problems. Specifically, we aim to study whether there exist the optimal control protocols of $u$ which can make the cost function minimized and what their expressions are if exist.

3 Results

3.1 Optimal control strategy of positive incentive

Using the approach of HJB equation, we can theoretically obtain the optimal control strategy for institutional reward $u_{R}^{*}(t)$ as (for more detailed theoretical calculations see Appendix A)

[TABLE]

Accordingly, with the optimal control strategy $u_{R}^{*}(t)$ , the dynamical system described by Eq. (3) becomes

[TABLE]

where the initial condition is $x(0)=x_{0}$ .

Solving the above differential equation, we have

[TABLE]

In order to confirm the above theoretical results for $u_{R}^{*}(t)$ , we would like to provide some numerical examples in what follows. Before that, we point out how to compute the amount of cumulative cost for a given control protocol of $u$ . When the initial state and the terminal state are given, we first determine the value of $t_{f}$ based on Eq. (3). We can then obtain the cumulative cost amount using Eq. (5). However, when $\delta$ is set to zero, that is, when the terminal state is $x(t_{f})=1$ , we find that for some given control protocols of $u$ the system needs to take an infinitely long time to reach the terminal state $x(t_{f})=1$ . In this case, we determine the value of $t_{f}$ by considering the $\delta$ value is sufficiently small, i.e., $\delta\rightarrow 0$ .

In Fig. 1, we show the fraction of cooperators as a function of time for the obtained optimal rewarding strategy $u_{R}^{*}(t)$ and two other given rewarding strategies $u(t)=0.5$ and $u(t)=40(1-x)$ . We find that all these three rewarding strategies can make the dynamical system reach the terminal state and eventually converge to the full cooperation state. But the optimal $u_{R}^{*}(t)$ does not lead to the fastest increase of the fraction of cooperators at the early stage of evolution. Notably, from the inset table of Fig. 1 we find that the amount of cumulative cost for the system to reach the expected terminal state $x(t_{f})=0.99$ from the initial state $x_{0}=0.5$ is about $68.57$ for $u_{R}^{*}(t)$ , which is lowest when it is compared with the other two cost values. Thus our numerical calculations support our theoretical results and confirm that the obtained control protocol $u_{R}^{*}(t)$ is superior and dominates other control protocols in the cost of implementing positive incentives.

3.2 Optimal control strategy of negative incentive

Similarly, using the approach of HJB equation we can theoretically obtain the optimal control strategy for institutional punishment $u_{P}^{*}(t)$ as (for more detailed theoretical calculations please see Appendix B)

[TABLE]

Accordingly, with the optimal control strategy $u_{P}^{*}(t)$ , the dynamical system described by Eq. (4) becomes

[TABLE]

where the initial condition is $x(0)=x_{0}$ .

Solving the above differential equation, we have

[TABLE]

In order to confirm the above theoretical results, we present some numerical calculations as shown in Fig. 2. We show that the fraction of cooperators as a function of time for the obtained optimal punishing strategy $u_{P}^{*}(t)$ and two other given punishing strategies $u(t)=0.5$ and $u(t)=9(1-x)$ . We find that all these three punishing strategies can make the dynamical system reach the desired terminal state and eventually converge to the full cooperation state, but the optimal $u_{P}^{*}(t)$ makes the slowest increase in cooperation level at the early stage of evolution among the three protocols. Noticeably, we still find that the amount of cumulation cost for the system to reach the terminal state $x(t_{f})=0.99$ from the initial state $x_{0}=0.5$ for the strategy $u_{P}^{*}(t)$ is about $7.79$ , which is lower than the other two cost values, as illustrated by the inset table of Fig. 2. Thus our numerical calculations confirm that the obtained control protocol $u_{P}^{*}(t)$ is better than other control protocols in the cost of implementing negative incentives and support our theoretical predictions.

3.3 Comparison between optimal positive and negative incentive strategies

Finally, we are interested in making a comparison between the optimal rewarding strategy $u_{R}^{*}(t)$ and the optimal punishing strategy $u_{P}^{*}(t)$ . Surprisingly and interestingly, we find that no matter whether the optimal strategy $u_{R}^{*}(t)$ or $u_{P}^{*}(t)$ is used, the dynamical system will become the identical equation $\dot{x}=c(n-r)x(1-x)/n$ . This means that both the optimal rewarding and punishing strategies can not only make the dynamical system converge to the full cooperation state, but also have the same value of $t_{f}$ and the same evolutionary trajectory if the model parameter values are identical. However, due to the expression difference of $u_{R}^{*}(t)$ and $u_{P}^{*}(t)$ we can predict that the two cumulative cost amounts should be different.

In order to have a clear distinction between the two cumulative cost amounts in different conditions, we thus show the cumulative cost values as a function of the initial cooperation level for $u_{R}^{*}(t)$ and $u_{P}^{*}(t)$ , as presented in Fig. 3. We can observe that the cumulative cost monotonically decreases with increasing the initial cooperation level for both optimal positive and negative incentive strategies. In addition, we find that the cumulative cost value of the optimal punishing strategy is lower than that of the optimal rewarding strategy for almost all the $x_{0}$ values. This indicates that both the optimal rewarding and punishing strategies can make the system reach the same terminal state from the same initial state, but the latter requires less cost. From this viewpoint, the optimal punishing strategy is better than the optimal rewarding strategy.

4 Discussion

Prosocial incentive can always promote cooperation, but implementing incentive is costly [57, 58, 59, 60]. From the viewpoint of optimization, institutions prefer to adopt the incentive strategy which is able to not only promote cooperation, but also require a low cost [34, 35]. In this work, by combining optimal control theory and evolutionary game approach, we have thus studied two optimal control problems of institutional incentives in a well-mixed population to explore the optimal rewarding and punishing strategies respectively. By using the method of HJB equation, we theoretically obtain the optimal rewarding and punishing strategies respectively, which can make the cumulative cost minimized and make the dynamical system reach the desired terminal state. Additionally, we provide numerical examples to respectively compare the obtained optimal incentive strategies with other given incentive ones which further confirm our theoretical results. Furthermore, when we compare the optimal rewarding strategy with the optimal punishing strategy, we find that the cumulative cost value for the former is larger than that for the latter in almost all the possible initial conditions by numerical calculations.

In this work, the dynamical system described by the replicator equation is nonlinear, thus it is not easy to obtain the exact expression of the optimal control protocols for institutional incentives in theory [53, 54, 55]. However, by employing the approach of HJB equation for continuous-time systems, we can succeed in theoretically obtaining the optimal control strategies for institutional incentives. To our best knowledge this is the first time when this dynamic programming method [50, 51, 52] is used in the framework of evolutionary game system. Additionally, these theoretical results are verified by numerical examples.

We find that both the optimal rewarding and punishing strategies can make the dynamical system have the same evolutionary trajectory eventually converging to the full cooperation state, but the former requires less cumulative cost than the latter. Indeed previous works reveal that institutional punishment is a cheaper way of leading the system to reaching the desired cooperation level than applying reward [35]. On the one hand, our work thus further confirms this conclusion. On the other hand, however, our work establishes a quantitative index characterizing how much cumulative cost is needed for a given incentive strategy to make the system reach the expected evolutionary destination. Thus, we can quantify how cheaper the optimal punishing strategy is than the optimal rewarding strategy.

A previous work shows that prosocial punishment promotes cooperation, but it does not increase the average payoff of individuals and even reduces the average payoff in some cases [61]. Thus people gaining high payoff prefer not to use punishment from the viewpoint of individual-level internal interactions. But institutions, as a top-down-like control mechanism, may not tend to enforce rewarding or punishing on individuals according to individual-level internal interactions. Instead, since institutions need to pay for providing incentives, policy-makers may prefer institutions which not only promote cooperative behaviors, but also are much cheaper [62]. Thus our work provides clear arguments why the use of punishment is preferred in human society.

For the above described reason our work only considers the optimal control problems to explore the optimal control strategies with the minimal cumulative cost. Indeed, the time for the system to reach the desired state is also an important quantity for the governance of the public goods [34, 63, 64]. Thus it could be interesting to consider the performance index of time as the objective functional into the optimal control problems of institutional incentives as a possible extension. Work along this line is in progress.

Appendix A: Theoretical calculations of optimal rewarding strategy

For solving the optimal control problem described by Eq. (6) to theoretically obtain the optimal positive incentive strategy, we use the approach of HJB equation [50, 51, 52]. To begin, we first define the function $f(x,u,t)$ as

[TABLE]

Here $f(x,u,t)$ is a continuously differentiable function coming from the right-hand part of Eq. (3) and $u$ is the control variable about positive incentive.

Accordingly, we define the Hamiltonian function $H_{R}(x,u,t)$ for the equation system as

[TABLE]

where $J^{*}(x,t)$ is the optimal cost function of $x$ and $t$ for the rewarding strategy given as $J^{*}(x,t)=\int^{t_{f}}_{0}\frac{[nu_{R}^{*}(t)]^{2}}{2}dt$ .

Solving $\frac{\partial H_{R}}{\partial u}=0$ , we know that the optimal positive incentive $u_{R}^{*}(t)$ should satisfy

[TABLE]

In general, in order to obtain the optimal control protocol, we need to solve the nonlinear canonical equations [50, 51, 52]. However, it is quite difficult to solve the equations directly. Instead, we use the dynamic programming method, HJB equation for continuous-time systems [50, 51, 52], to solve the optimal control problem. Consequently, the HJB equation for the dynamical system with positive incentive can be written as

[TABLE]

By substituting Eqs. (A2) and (A3) into the above HJB equation, we have

[TABLE]

Since we assume that the terminal time $t_{f}$ is free, the optimal cost function $J^{*}(x,t)$ is independent of $t$ . Consequently, we have

[TABLE]

We then yield from Eq. (A4)

[TABLE]

Since the per capita incentive $u>0$ and $x\in(0,1)$ , from Eq. (A3) we have

[TABLE]

Therefore only $\frac{\partial J^{*}}{\partial x}=-\frac{2n(n-r)cx}{(1-x)[1-(1-x)^{n}]^{2}}$ holds. By substituting this equation into Eq. (A3), we obtain the optimal rewarding strategy $u_{R}^{*}(t)$ as

[TABLE]

With the optimal rewarding strategy $u_{R}^{*}(t)$ , the dynamical system thus becomes

[TABLE]

where the initial condition is $x(0)=x_{0}$ .

By solving Eq. (A9), we finally obtain the solution of $x(t)$ with the optimal rewarding strategy as

[TABLE]

Appendix B: Theoretical calculations of optimal punishing strategy

We solve the optimal control problem described by Eq. (7) to theoretically obtain the optimal negative incentive strategy by using a similar calculation described in Appendix A. To do that, we first define the function $g(x,u,t)$ as

[TABLE]

Here $g(x,u,t)$ is a continuously differentiable function coming from the right-hand part of Eq. (4) and $u$ is the control variable about negative incentive.

Accordingly, we define the Hamiltonian function $H_{P}(x,u,t)$ for the equation system as

[TABLE]

where $J^{*}(x,t)$ is the optimal cost function of $x$ and $t$ for the punishing strategy given as $J^{*}(x,t)=\int^{t_{f}}_{0}\frac{[nu_{P}^{*}(t)]^{2}}{2}dt$ .

Solving $\frac{\partial H_{P}}{\partial u}=0$ , we know that the optimal negative incentive $u_{P}^{*}(t)$ should satisfy

[TABLE]

Similarly, we use the approach of HJB equation to obtain $u_{P}^{*}(t)$ by solving Eq. (B3). And the HJB equation for the dynamical system with negative incentive can be written as

[TABLE]

By substituting Eqs. (B2) and (B3) into the above HJB equation, we have

[TABLE]

Since we assume that $t_{f}$ is not fixed, the optimal cost function $J^{*}(x,t)$ is independent of $t$ . In other words, we have

[TABLE]

From Eq. (B4) we then obtain

[TABLE]

Since the per capita incentive $u>0$ and $x\in(0,1)$ , from Eq. (B3) we have

[TABLE]

Therefore only $\frac{\partial J^{*}}{\partial x}=-\frac{2nc(n-r)(1-x)}{x(1-x^{n})^{2}}$ holds. If substituting this equation into Eq. (B3), we obtain the optimal rewarding strategy $u_{P}^{*}(t)$ as

[TABLE]

With the optimal punishing strategy $u_{P}^{*}(t)$ , the dynamical system thus becomes

[TABLE]

where the initial condition is $x(0)=x_{0}$ .

After solving Eq. (B9), we finally obtain the solution of $x(t)$ with the optimal punishing strategy as

[TABLE]

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant No. 61503062) and the Hungarian National Research Fund (Grant K-120785).

Bibliography64

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Axelrod RM. The Evolution of Cooperation. Basic Books: New York; 2006.
2[2] Nowak MA. Five rules for the evolution of cooperation. Science 2006;314:1560-1563.
3[3] Rand DG, Nowak MA. Human cooperation. Trends Cogn Sci 2013;17:413-425.
4[4] Doebeli M, Hauert C. Models of cooperation based on the prisoner’s dilemma and the snowdrift game. Ecol Lett 2005;8:748-766.
5[5] Szabó G, Fáth G. Evolutionary games on graphs. Phys Rep 2007;446:97-216.
6[6] Perc M, Gómez-Gardeñes J, Szolnoki A, Floría LM, Moreno Y. Evolutionary dynamics of group interactions on structured populations: a review. J R Soc Interface 2013;10:20120997.
7[7] Wang Z, Kokubo S, Jusup M, Tanimoto J. Universal scaling for the dilemma strength in evolutionary games. Phys Life Rev 2015;14:1-30.
8[8] Ohtsuki H, Hauert C, Lieberman E, Nowak MA. A simple rule for the evolution of cooperation on graphs and social networks. Nature 2006;441:502-505.