Uneven Evolution of Cognition Across Generations of Generative AI Models
Isaac Galatzer-Levy, Daniel McDuff, Xin Liu, Jed McGiffin

TL;DR
This paper introduces a psychometric framework and the AIQ Benchmark to evaluate and track the uneven cognitive development of generative AI models across different modalities and generations.
Contribution
It presents a novel assessment framework revealing asymmetric cognitive growth and architectural biases in generative AI, highlighting limitations in current scaling approaches.
Findings
Models perform near-ceiling in verbal tasks but near-floor in perceptual reasoning.
Abstract reasoning improves faster in language than in visual formats.
Visual-perceptual organization remains largely stagnant across generations.
Abstract
The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, comparing them to human norms and tracking their evolution across generations. Initial evaluation of leading multimodal models using tasks adapted from the Wechsler Adult Intelligence Scale revealed a profoundly uneven cognitive architecture: near-ceiling performance in verbal comprehension and working memory (> percentile) contrasted with near-floor performance in perceptual reasoning (< percentile). To track developmental trajectories beyond human-normed limits, we developed the Artificial Intelligence Quotient (AIQ) Benchmark and applied it to six generations and two model families, revealing significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
