# Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation

**Authors:** Roland Arnold

arXiv: 2508.21270 · 2025-09-01

## TL;DR

This paper introduces Guess-and-Learn (G&L), a framework for measuring the cumulative error during the early adaptation phase of machine learning models, highlighting an often-overlooked aspect of model performance.

## Contribution

G&L formalizes a protocol to evaluate cold-start adaptability, relating it to classical theory and providing a reproducible benchmark for early learning efficiency.

## Key findings

- Smaller models adapt with fewer initial errors.
- Pretraining benefits vary across domains.
- Current models are above the oracle band, indicating an adaptability gap.

## Abstract

Evaluation of machine learning models typically emphasizes final accuracy, overlooking the cost of adaptation: the cumulative errors incurred while learning from scratch. Guess-and- Learn (G&L) v1.0 addresses this gap by measuring cold-start adaptability - the total mistakes a model makes while sequentially labeling an unlabeled dataset. At each step, the learner selects an instance, predicts its label, receives the ground truth, and updates parameters under either online (per-sample) or batch (delayed) mode. The resulting error trajectory exposes adaptation speed, selection quality, and bias - dynamics invisible to endpoint metrics.   G&L defines four tracks (Scratch/Pretrained $\times$ Online/Batch) to disentangle the effects of initialization and update frequency. We formalize the protocol, relate it to classical mistake-bound theory, and estimate a heuristic "oracle reference band" for MNIST as a plausibility reference. Baseline experiments on MNIST and AG News, spanning classical methods (Perceptron, k-NN), convolutional architectures (CNN, ResNet-50), and pretrained transformers (ViT-B/16, BERT-base), reveal systematic differences in early-phase efficiency: smaller models can adapt with fewer initial errors, while pretraining benefits vary by domain. Across settings, current models remain well above the oracle band, highlighting an adaptability gap.   By quantifying the mistake cost of early learning, G&L complements conventional benchmarks and provides a reproducible framework for developing learners that are not only accurate in the limit but also reliable from the first examples.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21270/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21270/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/2508.21270/full.md

---
Source: https://tomesphere.com/paper/2508.21270