Maximum a posteriori learning in demand competition games
Mohsen Rakhshan

TL;DR
This paper demonstrates that in a competitive inventory game, players can learn to play the Nash equilibrium through repeated observations and MAP estimation, even without knowing opponents' actions or utilities.
Contribution
It proves that players can learn the Nash policy and converge to equilibrium using MAP estimation based solely on their own sales observations.
Findings
Players' actions converge to Nash equilibrium.
Beliefs about opponents' strategies also converge.
MAP estimation enables learning without full information.
Abstract
We consider an inventory competition game between two firms. The question we address is this: If players do not know the opponent's action and opponent's utility function can they learn to play the Nash policy in a repeated game by observing their own sales? In this work it is proven that by means of Maximum A Posteriori (MAP) estimation, players can learn the Nash policy. It is proven that players' actions and beliefs do converge to the Nash equilibrium.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Experimental Behavioral Economics Studies · Economic theories and models
