Decision Making under Imperfect Recall: Algorithms and Benchmarks
Emanuel Tewolde, Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm, Vincent Conitzer

TL;DR
This paper introduces a benchmark suite for imperfect-recall decision problems in game theory, evaluates algorithms on it, and finds regret matching algorithms outperform traditional optimizers significantly.
Contribution
It presents the first benchmark suite for imperfect-recall decision problems and demonstrates the effectiveness of regret matching algorithms in this setting.
Findings
Regret matching algorithms outperform traditional optimizers.
Benchmark suite captures diverse problem types including privacy and AI safety.
RM algorithms are effective for large-scale constrained optimization.
Abstract
In game theory, imperfect-recall decision problems model situations in which an agent forgets information it held before. They encompass games such as the ``absentminded driver'' and team games with limited communication. In this paper, we introduce the first benchmark suite for imperfect-recall decision problems. Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems that elicit sensitive information, and AI safety via testing of agents in simulation. Across 61 problem instances generated using this suite, we evaluate the performance of different algorithms for finding first-order optimal strategies in such problems. In particular, we introduce the family of regret matching (RM) algorithms for nonlinear constrained optimization. This class of parameter-free algorithms has enjoyed tremendous success in solving large two-player zero-sum games,…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper is tackling an interesting problem. The empirical evaluation is quite large, spanning not only multiple, but also quite different environments. The observations on the performance of RM algorithms in these problems is interesting, and deserves further attention and investigation.
It is not entirely clear to me what is the contribution of the paper. It seems to be spread across presenting benchmarks, arguing for a particular evaluation metric, and then a set of evaluations that suggest RM methods should get more attention. I'm not sure it does the first two that effectively. I'm not sure the latter is sufficiently impactful, given the long history of imperfect recall being explored in games (using exactly RM-based algorithms; see work coming out of the AI poker competit
S1. Novel benchmark suite fills a clear gap in the literature. S2. Bridges two research communities —game theory and optimization—by adapting regret matching to nonlinear constraints. S3. Extensive experiments across diverse problem types and scales, with clear reporting of performance metrics and runtime. S4. Strong practical insight: RM+ shows remarkable stability and speed, potentially influencing solver design for large imperfect-information systems. S5. Clarity and reproducibility: meth
**W1.** Lack of theoretical guarantees for RM convergence in general nonconvex constrained settings. While empirical evidence is compelling, a formal proof (even partial or asymptotic) would strengthen the claim that RM can act as a “first-order optimizer.” --- **W2.** Limited analysis of failure cases. The appendix briefly mentions instances in which RM converges to poor local optima, but a deeper investigation of when and why this occurs would enhance understanding. --- **W3.** Connectio
* The paper explains reasonably well why imperfect recall problems are important. * It presents a sufficiently wide range of experiments to convince me that RM based methods are worth considering for this problem. * It promises public release of a range of problems, that may become a common benchmark for future solvers * No need to tuning parameters compared to GD
* Even the local convergence is supported only empirically * Negative examples where local optima are bad are hidden in the appendix and referenced to a different paper instead of clearly discussing the limitations * The main results in Table 1 are hard to interpret Further suggestions: The paper at places seems to make claims about general constraint polynomial optimization, not just imperfect recall games (L99,L255, 278), which looks confusing to me. Either make clear what wider class o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications
