# AI Agents as Universal Task Solvers

**Authors:** Alessandro Achille, Stefano Soatto

PMC · DOI: 10.3390/e28030332 · 2026-03-16

## TL;DR

This paper explores how AI agents can learn to solve new tasks efficiently by focusing on the algorithmic structure of past data rather than just memorizing it.

## Contribution

The paper introduces a theoretical framework for transductive inference that connects learning to the algorithmic information shared between tasks.

## Key findings

- Optimal speed-up in solving new tasks is linked to the algorithmic information shared with training data.
- Transductive inference is most beneficial when the data-generating mechanism is complex.
- Naïve scaling of models can lead to inefficient, brute-force solutions without transferable strategies.

## Abstract

We describe AI agents as stochastic dynamical systems and frame the problem of learning to reason as in transductive inference: Rather than approximating the distribution of past data as in classical induction, the objective is to capture its algorithmic structure so as to reduce the time needed to solve new tasks. In this view, information from past experience serves not only to reduce a model’s uncertainty, as in Shannon’s classical theory, but to reduce the computational effort required to find solutions to unforeseen tasks. Working in the verifiable setting, where a checker or reward function is available, we establish three main results. First, we show that the optimal speed-up for a new task is tightly related to the algorithmic information it shares with the training data, yielding a theoretical justification for the power-law scaling empirically observed in reasoning models. Second, while the compression view of learning, rooted in Occam’s Razor, favors simplicity, we show that transductive inference yields its greatest benefits precisely when the data-generating mechanism is most complex. Third, we identify a possible failure mode of naïve scaling: in the limit of unbounded model size and computing, models with access to a reward signal can behave as savants, brute-forcing solutions without acquiring transferable reasoning strategies. Accordingly, we argue that a critical quantity to optimize when scaling reasoning models is time, the role of which in learning has remained largely unexplored.

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025811/full.md

---
Source: https://tomesphere.com/paper/PMC13025811