Reactive means in the Iterated Prisoner's Dilemma
Grant Molnar, Caroline Hammond, and Feng Fu

TL;DR
This paper introduces morality metrics for the Iterated Prisoner's Dilemma, focusing on reactive strategies, to evaluate fairness and goodness, aiding the comparison of advanced strategies based on these moral measures.
Contribution
It proposes a set of morality metrics and computes reactive means for these metrics, providing a new way to assess strategies in IPD and ISG.
Findings
Reactive means for morality metrics are computed.
Certain morality functions are anticorrelated with success.
Metrics help compare strategies based on fairness and goodness.
Abstract
The Iterated Prisoner's Dilemma (IPD) is a well studied framework for understanding direct reciprocity and cooperation in pairwise encounters. However, measuring the morality of various IPD strategies is still largely lacking. Here, we partially address this issue by proposing a suit of plausible morality metrics to quantify four aspects of justice. We focus our closed-form calculation on the class of reactive strategies because of their mathematical tractability and expressive power. We define reactive means as a tool for studying how actors in the IPD and Iterated Snowdrift Game (ISG) behave under typical circumstances. We compute reactive means for four functions intended to capture human intuitions about ``goodness'' and ``fair play''. Two of these functions are strongly anticorrelated with success in the IPD and ISG, and the other two are weakly anticorrelated with success. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Game Theory and Cooperation · Experimental Behavioral Economics Studies · Evolutionary Psychology and Human Behavior
Reactive means in the Iterated Prisoner’s Dilemma
Grant Molnar1, Caroline Hammond1, Feng Fu1,2
1 Department of Mathematics, Dartmouth College, Hanover, NH 03755, USA
2 Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH 03756, USA
Abstract
The Iterated Prisoner’s Dilemma (IPD) is a well studied framework for understanding direct reciprocity and cooperation in pairwise encounters. However, measuring the morality of various IPD strategies is still largely lacking. Here, we partially address this issue by proposing a suit of plausible morality metrics to quantify four aspects of justice. We focus our closed-form calculation on the class of reactive strategies because of their mathematical tractability and expressive power. We define reactive means as a tool for studying how actors in the IPD and Iterated Snowdrift Game (ISG) behave under typical circumstances. We compute reactive means for four functions intended to capture human intuitions about “goodness” and “fair play”. Two of these functions are strongly anticorrelated with success in the IPD and ISG, and the other two are weakly anticorrelated with success. Our results will aid in evaluating and comparing powerful IPD strategies based on machine learning algorithms, using simple and intuitive morality metrics.
1 Introduction
Iterated games, most notably the Iterated Prisoner’s Dilemma (IPD), have been the objects of intensive study at least since Axelrod’s classical experiments [5]. Much research has been devoted to determining which strategies perform well under various circumstances [23, 26] [28]; the IPD has been studied in the presence of noise [23], social dynamics [1], and with other variations [22, 1, 8, 18, 33]. The framework of IPD and evolutionary game theory more generally have offered profound insights into understanding the evolution of cooperation [6, 24, 14, 25, 11, 32].
In particular, the discovery of zero-determinant (ZD) strategies by Press and Dyson has greatly reinvigorated the field with brand new perspectives [26, 29, 10, 9, 19]. ZD strategies are able to unilaterally enforce a linear relationship between their own average payoff and that of their co-player. An extortionate ZD player can thus take advantage of deliberately prescribed ZD strategies to demand an unfair share from their mutual interactions. Motivated by this fact, researchers have attempted to classify IPD strategies, for example, into partners versus rivals, by their capacity of fostering mutual cooperation or securing unilateral winning [2, 3, 17]. This dichotomic classification has a natural extension to the idea of morality. Human behavior is not solely guided by the desire to win, but also by moral values and judgments [30]. While such a classification of ZD strategies might be enlightening, there is also a strong need for studying the morality of IPD strategies more broadly.
In [28], Singer-Clark investigates the question of which IPD strategies are the “most moral” using a different methodology. Under this framework, a player in an IPD treats their competitor well if they cooperate a large proportion of the time. Singer-Clark uses eigenvalues on a population of such strategies to define two measures, EigenJesus and EigenMoses, for which strategy was “most moral”. This is a fascinating approach, but it has some serious drawbacks. One is that it does not incorporate noise or error, making it less applicable to real-world scenarios. It is also unclear how to generalize Singer-Clark’s methodology from games whose choices have a clear social valence, like the Prisoner’s Dilemma and Snowdrift, to other more complicated or nuanced games. A third drawback is the atemporality of these metrics: they do not pay attention to which player defected first, only which one defected more. Most seriously from our perspective, Singer-Clark’s eigenvalue-based morality is socially contingent. That is, for Singer-Clark, an actor’s morality depends on who they are playing against. It is natural to ask: is there some way of determining how a player behaves without relying on these variable social contingencies? We pursue this line of inquiry by introducing another set of metrics to assess the morality of IPD and Iterated Snowdrift Game (ISG) strategies.
For the rest of Section 1, we will introduce the mechanics of the games and strategies analyzed in this paper along with defining the reactive mean. An analysis of our results is included. Section 2 defines the player-oriented functions that will be analyzed. Section 3 provides some statistics about the reactive means for our functions of interest. The final section, Section 4, discusses further applications of reactive means. The explicit calculations for the reactive means are provided in the Appendix.
1.1 Games of Interest
The Prisoner’s Dilemma is a simple game. Fix a tuple of real numbers such that
[TABLE]
Each player chooses to cooperate () or defect (). If both players cooperate, they each receive a reward . If one player cooperates and the other player defects, then the cooperating player receives reward , and the defecting player receives reward . If both players defect, they each receive reward . By construction, players are collectively best off when they both cooperate, but are individually better off when they individually defect. Axelrod chose for his famous tournaments [5], and so these values are standard in much of the literature, but any game with payoffs satisfying the inequalities above qualifies as a Prisoner’s Dilemma. An IPD is simply a Prisoner’s Dilemma played repeatedly between the same two individuals.
The Snowdrift Game is formally almost identical to the Prisoner’s Dilemma but with a different payoff structure [31]. As above, we fix a tuple of real numbers , and each player can choose to cooperate () or defect (), with commensurate payoffs. However, we now ask that
[TABLE]
Therefore, no value is destroyed when a player defects against a cooperative adversary, and a player is better of cooperating than defecting against a defector. In accordance with [31] and [5], it is common to use for the Snowdrift game. The following scenario provides one interpretation of this game: two individuals are driving up an icy road when they discover a snowdrift cutting the avenue off. As long as at least one of them digs, the snowdrift will be removed and both can keep driving. However, neither individual enjoys the process of digging. The game of “Chicken” provides another interpretation of the Snowdrift Game. From this perspective, the players are car drivers heading towards each other to prove their courage. If either one pulls out early, then both survive, and the player who didn’t flinch also accrues accolades and honor. If both pull out simultaneously, the honor is split evenly between them. If neither pulls out, they both die. Comparably to an IPD, an ISG is a Snowdrift Game played repeatedly between the same two opponents.
1.2 Strategies of Interest
To investigate this idea, we analyze the behavior of a specific category of strategies. A reactive memory one strategy is a triple , where is the probability that cooperates on the first round of the game, and is the probability that cooperates if their opponent made move in the previous round.
Let and be reactive memory one strategies. We define , and we define .
If is a function depending on and , we write for the function that interchanges the roles of and . For instance, , and .
We write
[TABLE]
for the probability distribution of cooperation and defection for and after rounds, and observe
[TABLE]
We also write
[TABLE]
for the transition matrix of the Markov process indicated above. Clearly, we have for all . If is mixing, then there is a unique steady-state distribution
[TABLE]
for and . Let denote the long-run probability that cooperates in any given round; thus, is the probability that cooperates in any given round. In [23], Nowak proved that
[TABLE]
and gave the following formulas for and :
[TABLE]
Here and . The quantity measures the responsiveness of ; that is, the degree to which treats adversaries who cooperate better than adversaries who defects. Writing as a function of and , we find , as we should expect.
Let be an integrable function of two strategies and , and write . We define
[TABLE]
to be the reactive mean of for . Note that , so in fact
[TABLE]
The quantity measures the expected value of when an adversary for is chosen uniformly at random from the set of reactive memory one strategies. If is independent of , then
[TABLE]
1.3 Summary of Results
We were able to show that a player’s score in both the IPD and ISG is negatively correlated with all four of the metrics for justice delineated in Section 2 to different extents. Our methods are especially exciting because they give objective measures of the behavior of actors. Unlike the EigenMoses and EigenJesus metrics calculated in [28], the reactive means of asymptotic niceness, long-term cooperation rate, responsiveness, and reciprocity (see Section 2 below) do not depend on the behaviors of individual opponents. As a result, players can be assigned a strict level of morality that does not change with the population of opponents. In addition, the measures of morality obtained in this paper apply to opponents of every possible reactive memory one strategy, of which there are infinitely many. As long as the values of and are known, it is straightforward to evaluate any of these morality functions. Also, the methodology used in this paper incorporates noise to some degree since a memory-one strategy with added noise is essentially just another memory one strategy. Of course, this does not incorporate all forms of noise since it can cause the strategy to shift dynamically, but it does make the results more realistic.
2 Model and Methods
Metrics of Justice.
In modern parlance, “justice” means “retribution”. Historically, however, justice was equated with social morality writ large, subsuming concepts like fair play, and treating other people well. We take inspiration from these intuitions to enumerate a few loose criteria for just actions.
A player is just insofar as they treat kind players well. 2. 2.
A player is just insofar as they treat other players well. 3. 3.
A player is just insofar as they treat other players well when the others treat them well. 4. 4.
A player is just insofar as they treat other players as the others treat them.
Inspired by these intuitions, we develop various metrics which correspond to our folk sense of justice.
2.1 Asymptotic Niceness
Axelrod observed that the most successful strategies in his tournament were “nice” in the sense that they did not defect before their adversaries did [5, p.10]. This is not an especially useful notion for us, however, because it depends intimately on , whereas our focus is on long-run behavior. Thus we define the (asymptotic) niceness of against to be the long-run probability that if and cooperate in the same round, subsequently defects before . Thus if is a reactive memory one strategy, and for every reactive memory one strategy , then , and if for every reactive memory one strategy , then .
2.2 Reciprocity
Let denote the strategy that begins by cooperating and then reciprocates the last move of their adversary; this famous strategy is referred to as “Tit-for-Tat”. Axelrod observed that fared better than any other strategy in his tournaments. Extensive research has gone into when and how succeeds against other strategies [5, 28], but also perfectly exemplifies a willingness to reciprocate the actions of its adversaries. In a sense, acts with perfect justice. Consider by contrast “the Bully” , which begins by defecting and then defects against cooperation and cooperates against defection. The Bully exploits those who are willing to cooperate with it, while submitting to and cooperating with those who defect against it. In a sense, is the opposite of ; indeed, the coordinates are maximally distant from the coordinates in the unit square. Moreover, behaves in a way that intuitively parses as “evil”: preying on the kind, and capitulating to the cruel. We define the reciprocity of with to be the long-run probability that ’s move is the same as ’s previous move. Thus if is a reactive memory one strategy, and for every reactive memory one strategy , then and , and if for every reactive memory one strategy , then and .
2.3 Functions of Interest
Define as the long-run average score that earns each round against in the IPD. For instance, if is Tit-for-Tat, and is the Bully, then , and using the values from classic literature mentioned in Section 1. Likewise, let denote the long-run average score that earns each round against in the ISG. For instance, if is Tit-for-Tat, and is the Bully, then , and .
Let
[TABLE]
The set comprises our functions of interest for this paper. The functions comprise the fundamental building blocks for the behavior of strategy against strategy . The functions and measure the degree to which is kind to , in the sense of not initiating defection (for ), or cooperating (for ). These functions reflect the metrics of justice defined in 1 and 2, respectively. The functions and measure the degree to which reciprocates the actions of ; in other words, the degree to which asymptotically follows the Tit-for-Tat strategy. These functions reflect the ideas of 3 and 4. The functions and measure the success of against in the IPD and ISG, respectively.
These quantities are intimately interconnected, and can each be expressed in terms of and . For ease of notation, we suppress dependence on and in the equations below.
[TABLE]
For each function of interest , we have an explicit formula for ; however, these formulas are generally ungainly and unedifying, so we have relegated them to the Appendix, where they are used to produce cleaner data.
3 Results and Analysis
3.1 Heat Maps
For each function of interest , we have a heat map for pictured below. is graphed with white (for low values) and dark purple (for high values) as a function of and . The scale for each heat map is given to its right. Heat maps of the complements , , , , and are also included. Neither heat maps nor statistical analyses of , , and are included because these are constant values. Additionally, , , , and are not included since =, =, =, and =, making additional analyses of these complements redundant. In the heat maps for and , the different values of correspond to those used in classic literature referred to in Section 1.
These charts are intriguing, and already suffice to give us some useful information regarding the morality metrics.
In order to maximize one’s score in the ISG, a player would always want to defect when their opponent cooperated in the last round. However, they would not want to defect every time their opponent defected. This suggests that that is more positively correlated with than is. 2. 2.
Each of the reactive means, with the exception of scores, displays one or more symmetries. This is to be expected since there are no weights given to cooperation or defection until the scores are calculated. The symmetries evident in the stationary distributions are especially subtle since they do not display symmetry with respect to their own values, but rather with the plots of other stationary distributions. 3. 3.
Asymptotic niceness, responsiveness, and reciprocity all have a positive correlation with , while a high score is anticorrelated with this value for both the IPD and ISG. 4. 4.
Both responsiveness and reciprocity increase as the chosen strategy becomes more like Tit-for-Tat. As we know from Axelrod’s tournaments, this suggests that both of these metrics of justice are more likely to equate to victory over the opponent on average, as long as the ambient population is not excessively hostile.
3.2 Statistics
We now compute means and standard deviations for our functions of interest as well as the naive values for the complements of interest. We compute these values exactly where convenient, and otherwise using 5,000 sample points distributed equally over the solution space. It can be seen that these values are concordant with the heatmaps above.
[TABLE]
Next, we compute covariances.
[TABLE]
We computed the mean and covariances of the auxiliary function in order to compute covariances for , , , , and . As covariance is bilinear, the listed covariances suffice to compute covariance of , , , , , and with each of our functions of interest. We can now calculate the correlations.
[TABLE]
From the above chart, it is evident that cooperation rate and asymptotic niceness are strongly anticorrelated with success, especially the former. Niceness is slightly more beneficial in the IPD than in the ISG, while the opposite is true with respect to the cooperation rate. It is interesting that this type of behavior is observed in asymptotic niceness given that the results of Axelrod’s tournaments suggest nice strategies have the unusually good performance.
Another interesting observation is that the correlation between responsiveness and success in the IPD, while slightly negative, is almost zero. This same relationship can be seen with reciprocity, as and have virtually the same dynamics and correlations. This suggests that, if one views justice as treating others how they treat you, a player can play an almost perfectly just game and be victorious against around half of the reactive memory one strategies. This idea is additionally supported by the almost identical correlations for .
4 Discussions and Conclusion
In this work we have focused on a few simple yet intuitive morality metrics, and it is straightforward to consider various extensions in this regard. Given the subjective nature of morality, there are numerous other functions to be investigated under the framework of this paper. For instance, if a player believes they are just when they treat their opponent as they are treated, a measure of morality could be the long-run probability of making the same move as their partner in the next round, or some other variation of this aspect. In other words, there could be a measure of the probability of a player’s opponent and the player themselves cooperating in the same round or defecting in the same round. This idea is already partially captured by and , so a linear combination of these two values could be a good metric for this idea.
Another potential function of interest would be a variation on positive reciprocity that just takes into account how often a player reciprocates when their opponent cooperates. Additionally, the different functions could be broken down into different distributions to see exactly how moral one needs to be to succeed. For instance, one could examine against . This would specifically convey whether or not it is better to cooperate more with cooperators than defectors or vice versa. This idea would be especially useful for and since their correlations with score were so small in absolute value.
Beyond this, one could examine an environment where an opponent or a player is more likely to choose one strategy over another. In other words, instead of assuming the opponent chooses a strategy with uniform probability, the distribution could be a truncated Gaussian or another distribution. Similarly, a fine-grained description of their pairwise encounters (who-meets-whom relationships) can be based on graphs or networks [13]. This incorporation could help reflect the tendency of certain populations to congregate when they have shared ideals.
While the focus of this paper was on morality, the development of reactive means has more widespread applications. Any function that measures the behavior between two competing players can theoretically have its reactive mean computed. This could lead to analyses on other ideas such as the consistency of a player’s moves [31] or the success of a player past just their average score.
Lastly, we could take averages over families of opponents besides reactive memory one strategies or players having asymmetrical roles [20]. It would be natural, for instance, to integrate against all memory one strategies: arguably, this level of generality suffices for a total understanding of the IPD [26].
If desired, one could of course integrate over finite-memory strategies [16], or any family of strategies which bears a natural parameterization. In an adaptation of the EigenJesus and EigenMoses metrics derived in [28], one could determine the scores for a player in an environment of opponents sampled independently from the pool of all reactive memory one strategies. Under this framework, as the number of opponents increases, the measured scores will approach their reactive average values. In addition, the development of simple and intuitive metrics using reactive means will aid in evaluating and comparing IPD strategies generated in complicated ways, such as those based on machine learning algorithms, including reinforcement learning [15, 27] and particle swarm optimization [12].
In sum, we evaluate and compare the moral nature of reactive strategies employed in the Iterated Prisoner’s Dilemma (IPD) by drawing on human’s intuitive perception of “fair play” and “goodness”. Using these morality metrics, we demonstrate that two of the metrics are significantly associated with their success in the IPD, while the other two metrics are weakly related. Our results can help further conceive new ways, by means of integrating morality concerns, for enhancing fairness and cooperation among adaptive and learning individuals [7, 21].
Code Availability
The code used in this study is available upon reasonable request.
Acknowledgments
We thank Steve Fan for his clever comments on integration. F.F. gratefully acknowledges support from the Bill & Melinda Gates Foundation (award no. OPP1217336), the NIH COBRE Program (grant no.1P20GM130454), and the Neukom CompX Faculty Grant.
Appendix
The reactive means of the functions we defined in Section 2 may be computed numerically for fixed and , but these computations become much slower as we permit and to vary. Moreover, a double integral with both and in the denominator is difficult (but possible) to integrate symbolically.
However, if , we may write and perform the change of variables to obtain
[TABLE]
This change of variables renders our integrals much more manageable, and a straightforward but tedious computation yields
[TABLE]
[TABLE]
Here we have adopted that refers to the natural logarithm, rather than or .
Of course , , and . The functions and may be computed as linear combinations of , , , and . These formulas are valid except when . But if and only if , and in this case (3) suffices to compute the reactive mean of each function in our set above. Indeed, the denominators in (1) and (2) simplify to 1, and we have
[TABLE]
Now if , then is Tit-for-Tat (, ) or the Bully , ), respectively. But these boundary cases are easy to evaluate directly. Recalling that denotes Tit-for-Tat, and denotes the Bully, we have
[TABLE]
Irrespective of these considerations, we inherit the relations from Sections 1 and 2, that is, from the behavior of the functions themselves.
[TABLE]
With these identities in hand, it is now straightforward to generate the heatmaps that will be given in Section 3 (the authors used Python). These expressions also suffice to compute exact values for most of covariances between our functions of interest. As an example, we compute the covariance of and . Explicitly, we have
[TABLE]
The series in (8) telescope to , , and respectively. Similarly, performing a partial fraction decomposition on the terms in the series of (9) yields scaled copies of the sum
[TABLE]
possibly with some terms omitted at the beginning of the series. This perspective lets us evaluate these series as , , and . Finally, classic methods from analytic number theory [4] let us evaluate the series in (10) as and . Summing these values, we conclude that
[TABLE]
and so
[TABLE]
The other covariances we give explicit values for can be evaluated similarly.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. Prisoner’s dilemma. http://www.prisoners-dilemma.com/ , 2020.
- 22. Ethan Akin. What you gotta know to play good in the iterated prisoner’s dilemma. Games , 6(3):175–190, 2015.
- 33. Ethan Akin. The iterated prisoner’s dilemma: good strategies and their dynamics. Ergodic Theory, Advances in Dynamical Systems , pages 77–107, 2016.
- 44. Tom M Apostol. Introduction to analytic number theory . Springer Science & Business Media, 1998.
- 55. Robert Axelrod. Effective choice in the prisoner’s dilemma. Journal of conflict resolution , 24(1):3–25, 1980.
- 66. Robert Axelrod and William D Hamilton. The evolution of cooperation. science , 211(4489):1390–1396, 1981.
- 77. Wolfram Barfuss, Jonathan F Donges, Vítor V Vasconcelos, Jürgen Kurths, and Simon A Levin. Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse. Proceedings of the National Academy of Sciences , 117(23):12915–12922, 2020.
- 88. Ana LC Bazzan, Rafael H Bordini, and John A Campbell. Evolution of agents with moral sentiments in an iterated prisoner’s dilemma exercise. Game theory and decision theory in agent-based systems , pages 43–64, 2002.
