Tool Building as a Path to "Superintelligence"
David Koplow, Tomer Galanti, Tomaso Poggio

TL;DR
This paper introduces a benchmark to evaluate large language models' ability to perform complex logical inference, highlighting the importance of tool design for achieving superintelligence through test-time search.
Contribution
It develops a novel benchmark for measuring reasoning success probability and emphasizes the role of tool design in enabling LLMs to reach superintelligence.
Findings
Small LLMs' reasoning success declines superlinearly with task depth.
Frontier models show partial robustness on complex inference tasks.
Precise tool calls are critical for successful reasoning at scale.
Abstract
The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability . In this work, we design a benchmark to measure on logical out-of-distribution inference. We construct a class of tasks involving GF(2) circuit reconstruction that grow more difficult with each reasoning step, and that are, from an information-theoretic standpoint, impossible to reliably solve unless the LLM carefully integrates all of the information provided. Our analysis demonstrates that while the value for small LLMs declines superlinearly as depth increases, frontier models exhibit partial robustness on this task. Furthermore, we find that successful reasoning at scale is contingent upon precise tool calls, identifying tool design as a critical capability for LLMs to achieve general superintelligence through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
