Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents
Cristian Chica, Yinglong Guo, Gilad Lerman

TL;DR
This paper provides a theoretical explanation for how Q-learning agents in repeated pricing games can learn to charge supracompetitive prices, highlighting conditions that support collusive behavior without explicit equilibrium computation.
Contribution
It introduces a novel theoretical framework explaining collusive pricing by Q-learning agents in infinite repeated games, including new equilibrium concepts and conditions for collusion.
Findings
Q-learning agents can learn to charge supracompetitive prices under certain conditions.
Naive collusion is not an equilibrium unless the collusive price is a Nash equilibrium.
Grim trigger policies can sustain collusive behavior as an equilibrium.
Abstract
There is growing experimental evidence that -learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies based solely on observed profits, without computing equilibrium strategies. We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, and when the -function satisfies certain inequalities at the end of experimentation, firms learn to consistently charge supracompetitive prices. We introduce a new class of one-memory subgame perfect equilibria (SPEs) and provide conditions under which learned behavior is supported by naive collusion, grim trigger policies, or increasing strategies. Naive collusion does not constitute an SPE unless the collusive-enabling price is a one-stage Nash equilibrium, whereas grim…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
