A Mathematical Programming Approach to Computing and Learning Berk--Nash Equilibria in Infinite-Horizon MDPs

Quanyan Zhu; Zhengye Han

arXiv:2603.13641·cs.GT·March 17, 2026

A Mathematical Programming Approach to Computing and Learning Berk--Nash Equilibria in Infinite-Horizon MDPs

Quanyan Zhu, Zhengye Han

PDF

Open Access

TL;DR

This paper introduces a new mathematical programming approach to compute and learn Berk-Nash equilibria in infinite-horizon MDPs, addressing model misspecification and proposing an online learning scheme with convergence guarantees.

Contribution

It provides a rigorous characterization of Berk-Nash equilibria via coupled linear programs and bilevel optimization, and develops an entropy-regularized, smooth objective for efficient learning.

Findings

01

Proves existence of a unique soft Bellman fixed point with entropy regularization.

02

Develops an online learning algorithm with sublinear regret.

03

Demonstrates effective convergence to the KL-minimizing model in numerical experiments.

Abstract

We study sequential decision-making when the agent's internal model class is misspecified. Within the infinite-horizon Berk-Nash framework, stable behavior arises as a fixed point: the agent acts optimally relative to a subjective model, while that model is statistically consistent with the long-run data endogenously generated by the policy itself. We provide a rigorous characterization of this equilibrium via coupled linear programs and a bilevel optimization formulation. To address the intrinsic non-smoothness of standard best-response correspondences, we introduce entropy regularization, establishing the existence of a unique soft Bellman fixed point and a smooth objective. Exploiting this regularity, we develop an online learning scheme that casts model selection as an adversarial bandit problem using an EXP3-type update, augmented by a novel conjecture-set zooming mechanism that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications