Computational Performance of Deep Reinforcement Learning to find Nash Equilibria
Christoph Graf, Viktor Zobernig, Johannes Schmidt, Claude Kl\"ockl

TL;DR
This paper evaluates how different parameter settings of the deep reinforcement learning algorithm DDPG affect its ability to find Nash equilibria in price competition models, showing potential for studying strategic firm behavior.
Contribution
It systematically analyzes the impact of various parameters on DDPG's convergence to Nash equilibria in a competitive pricing setting, highlighting optimal configurations.
Findings
Parameter tuning can achieve up to 99% convergence to equilibrium.
Certain parameter configurations significantly improve learning stability.
Deep RL can effectively model strategic firm interactions in complex environments.
Abstract
We test the performance of deep deterministic policy gradient (DDPG), a deep reinforcement learning algorithm, able to handle continuous state and action spaces, to learn Nash equilibria in a setting where firms compete in prices. These algorithms are typically considered model-free because they do not require transition probability functions (as in e.g., Markov games) or predefined functional forms. Despite being model-free, a large set of parameters are utilized in various steps of the algorithm. These are e.g., learning rates, memory buffers, state-space dimensioning, normalizations, or noise decay rates and the purpose of this work is to systematically test the effect of these parameter configurations on convergence to the analytically derived Bertrand equilibrium. We find parameter choices that can reach convergence rates of up to 99%. The reliable convergence may make the method a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation Diffusion and Forecasting · Auction Theory and Applications
