IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

Aviya Maimon; Amir DN Cohen; Gal Vishne; Shauli Ravfogel; Reut Tsarfaty

arXiv:2507.20208·cs.CL·July 29, 2025

IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs

Aviya Maimon, Amir DN Cohen, Gal Vishne, Shauli Ravfogel, Reut Tsarfaty

PDF

TL;DR

This paper introduces a new evaluation framework for large language models that uses factor analysis to uncover core skills, providing a more interpretable and comprehensive assessment than traditional benchmark scores.

Contribution

It proposes a novel factor analysis-based paradigm to identify latent skills in LLMs and applies it to a large benchmark, offering practical tools for model profiling and task redundancy detection.

Findings

01

Identified a small set of latent skills explaining most performance variance.

02

Revealed redundancies among benchmark tasks.

03

Provided tools for better model comparison and skill profiling.

Abstract

Current evaluations of large language models (LLMs) rely on benchmark scores, but it is difficult to interpret what these individual scores reveal about a model's overall skills. Specifically, as a community we lack understanding of how tasks relate to one another, what they measure in common, how they differ, or which ones are redundant. As a result, models are often assessed via a single score averaged across benchmarks, an approach that fails to capture the models' wholistic strengths and limitations. Here, we propose a new evaluation paradigm that uses factor analysis to identify latent skills driving performance across benchmarks. We apply this method to a comprehensive new leaderboard showcasing the performance of 60 LLMs on 44 tasks, and identify a small set of latent skills that largely explain performance. Finally, we turn these insights into practical tools that identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.