AGI-Elo: How Far Are We From Mastering A Task?

Shuo Sun; Yimin Zhao; Christina Dao Wen Lee; Jiawei Sun; Chengran Yuan; Zefan Huang; Dongen Li; Justin KW Yeoh; Alok Prakash; Thomas W. Malone; Marcelo H. Ang Jr

arXiv:2505.12844·cs.AI·May 27, 2025

AGI-Elo: How Far Are We From Mastering A Task?

Shuo Sun, Yimin Zhao, Christina Dao Wen Lee, Jiawei Sun, Chengran Yuan, Zefan Huang, Dongen Li, Justin KW Yeoh, Alok Prakash, Thomas W. Malone, Marcelo H. Ang Jr

PDF

Open Access 1 Repo

TL;DR

This paper presents a unified, difficulty-aware rating system for evaluating AI models and humans across multiple domains, providing detailed insights into progress and remaining challenges on the path to AGI.

Contribution

It introduces a novel rating framework that jointly models task difficulty and model competency, enabling fine-grained evaluation beyond traditional metrics.

Findings

01

The system effectively captures the difficulty distribution of real-world challenges.

02

It provides interpretable insights into model progression and remaining gaps.

03

Validated across multiple datasets and domains.

Abstract

As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SS47816/AGI-Elo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)

MethodsFocus