Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model
Noam Levi

TL;DR
This paper introduces a solvable linear model that links neural scaling laws to inference, showing how training reduces the impact of hard-to-learn instances and predicting the behavior of pass@$k$ failure rates.
Contribution
It presents the Latent Instance Difficulty (LID) model, connecting training-dependent inference scaling to the tail of target difficulty distributions, with testable predictions.
Findings
Pass@$k$ failure rate follows a power-law decay with training-dependent exponent.
Training shrinks the hard tail of the error distribution, improving generalization.
Model predictions validated on simulations and real data proxies.
Abstract
We analyze neural scaling laws in a solvable model of last-layer fine-tuning where targets have intrinsic, instance-heterogeneous difficulty. In our Latent Instance Difficulty (LID) model, each input's target variance is governed by a latent ``precision'' drawn from a heavy-tailed distribution. While generalization loss recovers standard scaling laws, our main contribution connects this to inference. The pass@ failure rate exhibits a power-law decay, , but the observed exponent is training-dependent. It grows with sample size before saturating at an intrinsic limit set by the difficulty distribution's tail. This coupling reveals that learning shrinks the ``hard tail'' of the error distribution: improvements in the model's generalization error steepen the pass@ curve until irreducible target variance dominates. The LID model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
