Tool Building as a Path to "Superintelligence"

David Koplow; Tomer Galanti; Tomaso Poggio

arXiv:2602.21061·cs.AI·February 25, 2026

Tool Building as a Path to "Superintelligence"

David Koplow, Tomer Galanti, Tomaso Poggio

PDF

Open Access

TL;DR

This paper introduces a benchmark to evaluate large language models' ability to perform complex logical inference, highlighting the importance of tool design for achieving superintelligence through test-time search.

Contribution

It develops a novel benchmark for measuring reasoning success probability and emphasizes the role of tool design in enabling LLMs to reach superintelligence.

Findings

01

Small LLMs' reasoning success declines superlinearly with task depth.

02

Frontier models show partial robustness on complex inference tasks.

03

Precise tool calls are critical for successful reasoning at scale.

Abstract

The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability $γ$ . In this work, we design a benchmark to measure $γ$ on logical out-of-distribution inference. We construct a class of tasks involving GF(2) circuit reconstruction that grow more difficult with each reasoning step, and that are, from an information-theoretic standpoint, impossible to reliably solve unless the LLM carefully integrates all of the information provided. Our analysis demonstrates that while the $γ$ value for small LLMs declines superlinearly as depth increases, frontier models exhibit partial robustness on this task. Furthermore, we find that successful reasoning at scale is contingent upon precise tool calls, identifying tool design as a critical capability for LLMs to achieve general superintelligence through the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)