AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore   LLMs' Complex Reasoning Capabilities

Fabrizio Davide; Pietro Torre; Leonardo Ercolani; Andrea Gaggioli

arXiv:2412.09385·cs.AI·April 23, 2025

AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities

Fabrizio Davide, Pietro Torre, Leonardo Ercolani, Andrea Gaggioli

PDF

Open Access 1 Repo

TL;DR

This study evaluates the ability of large language models to forecast the emergence of AGI by 2030, using automated peer review and benchmarking, revealing diverse predictions and highlighting the need for specialized evaluation methods.

Contribution

Introduces an automated peer review process for LLM forecasts and develops an AGI-specific benchmark to assess LLMs' complex reasoning in speculative scenarios.

Findings

01

LLMs' AGI likelihood estimates vary widely from 3% to 47.6%.

02

High reliability in peer review scores with ICC=0.79.

03

External benchmarks show consistent LLM rankings across evaluation methods.

Abstract

We tasked 16 state-of-the-art large language models (LLMs) with estimating the likelihood of Artificial General Intelligence (AGI) emerging by 2030. To assess the quality of these forecasts, we implemented an automated peer review process (LLM-PR). The LLMs' estimates varied widely, ranging from 3% (Reka- Core) to 47.6% (GPT-4o), with a median of 12.5%. These estimates closely align with a recent expert survey that projected a 10% likelihood of AGI by 2027, underscoring the relevance of LLMs in forecasting complex, speculative scenarios. The LLM-PR process demonstrated strong reliability, evidenced by a high Intraclass Correlation Coefficient (ICC = 0.79), reflecting notable consistency in scoring across the models. Among the models, Pplx-70b-online emerged as the top performer, while Gemini-1.5-pro-api ranked the lowest. A cross-comparison with external benchmarks, such as LMSYS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LeonardoErcolani/AGILab-Peer-Review
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence

MethodsALIGN