AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Lance Ying; Ryan Truong; Prafull Sharma; Kaiya Ivy Zhao; Nathan Cloos; Kelsey R. Allen; Thomas L. Griffiths; Katherine M. Collins; Jos\'e Hern\'andez-Orallo; Phillip Isola; Samuel J. Gershman; Joshua B. Tenenbaum

arXiv:2602.17594·cs.AI·February 20, 2026

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, Jos\'e Hern\'andez-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum

PDF

Open Access

TL;DR

This paper introduces the AI GameStore, a scalable platform for evaluating AI's human-like general intelligence through playing and learning all conceivable human games, using large language models and human-in-the-loop synthesis.

Contribution

It proposes a novel open-ended evaluation framework based on human-designed games and demonstrates its feasibility by creating and testing 100 games sourced from popular digital platforms.

Findings

01

Current models score less than 10% of human average on most games.

02

Models struggle with games requiring world-model learning, memory, and planning.

03

The platform enables scalable, open-ended evaluation of general intelligence.

Abstract

Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity. Most are also static, quickly saturating as developers explicitly or implicitly optimize for them. We propose that a more promising way to evaluate human-like general intelligence in AI systems is through a particularly strong form of general game playing: studying how and how well they play and learn to play \textbf{all conceivable human games}, in comparison to human players with the same level of experience, time, or other resources. We define a "human game" to be a game designed by humans for humans, and argue for the evaluative suitability of this space of all such games people can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Ethics and Social Impacts of AI · Multimodal Machine Learning Applications