Risk Preferences of Learning Algorithms
Andreas Haupt, Aroon Narayanan

TL;DR
This paper reveals that the widely used $ ext{epsilon}$-Greedy learning algorithm naturally develops risk aversion, favoring lower-variance actions, which can impact fairness and diversity in decision-making.
Contribution
It identifies emergent risk aversion in $ ext{epsilon}$-Greedy algorithms and proposes two correction methods to restore risk neutrality.
Findings
$ ext{epsilon}$-Greedy prefers lower-variance actions with high probability.
Risk aversion persists even when higher-variance actions have higher expected payoff.
Proposed reweighting and optimistic estimation methods effectively eliminate the bias.
Abstract
Agents' learning from feedback shapes economic outcomes, and many economic decision-makers today employ learning algorithms to make consequential choices. This note shows that a widely used learning algorithm, -Greedy, exhibits emergent risk aversion: it prefers actions with lower variance. When presented with actions of the same expectation, under a wide range of conditions, -Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, ranging from concerns about fairness to homogenization, and holds transiently even when the riskier action has a strictly higher expected payoff. We discuss two methods to correct this bias. The first method requires the algorithm to reweight data as a function of how likely the actions were to be chosen. The second requires the algorithm to have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater resources management and optimization · Economic theories and models · Advanced Bandit Algorithms Research
