Decoding fairness: a reinforcement learning perspective
Guozhong Zheng, Jiqiang Zhang, Xin Ou, Shengfeng Deng, and Li Chen

TL;DR
This paper demonstrates that fairness in the ultimatum game can emerge endogenously through reinforcement learning, specifically Q-learning, without relying on external factors, aligning with behavioral experiment observations.
Contribution
It introduces a reinforcement learning framework for the ultimatum game showing fairness emerges naturally from reward maximization, challenging exogenous explanations.
Findings
Fairness emerges when players consider future rewards and experiences.
The system stabilizes into fair or rational strategies over time.
Results are robust across different role assignment methods and population structures.
Abstract
Behavioral experiments on the ultimatum game (UG) reveal that we humans prefer fair acts, which contradicts the prediction made in orthodox Economics. Existing explanations, however, are mostly attributed to exogenous factors within the imitation learning framework. Here, we adopt the reinforcement learning paradigm, where individuals make their moves aiming to maximize their accumulated rewards. Specifically, we apply Q-learning to UG, where each player is assigned two Q-tables to guide decisions for the roles of proposer and responder. In a two-player scenario, fairness emerges prominently when both experiences and future rewards are appreciated. In particular, the probability of successful deals increases with higher offers, which aligns with observations in behavioral experiments. Our mechanism analysis reveals that the system undergoes two phases, eventually stabilizing into fair…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExperimental Behavioral Economics Studies · Economic and Technological Innovation
MethodsADaptive gradient method with the OPTimal convergence rate · Q-Learning
