Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes

Mohammed Alnemari; Rizwan Qureshi; Nader Begrazadah

arXiv:2603.07365·cs.LG·March 10, 2026

Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes

Mohammed Alnemari, Rizwan Qureshi, Nader Begrazadah

PDF

Open Access

TL;DR

This study investigates how small neural network models (under 20 million parameters) improve in performance, revealing that their error reduction follows power laws with scale and that their mistake patterns differ significantly from larger models.

Contribution

It provides the first detailed analysis of neural scaling laws in the tiny model regime, highlighting differences in error structure and calibration compared to large models.

Findings

01

Error rate follows approximate power laws with scale in small models.

02

Small models change which inputs they misclassify as size increases.

03

Small models are surprisingly well calibrated despite their size.

Abstract

Neural scaling laws describe how model performance improves as a power law with size, but existing work focuses on models above 100M parameters. The sub-20M regime -- where TinyML and edge AI operate -- remains unexamined. We train 90 models (22K--19.8M parameters) across two architectures (plain ConvNet, MobileNetV2) on CIFAR-100, varying width while holding depth and training fixed. Both follow approximate power laws in error rate: $α = 0.156 \pm 0.002$ (ScaleCNN) and $α = 0.106 \pm 0.001$ (MobileNetV2) across five seeds. Since prior work fit cross-entropy loss rather than error rate, direct exponent comparison is approximate; with that caveat, these are 1.4--2x steeper than $α \approx 0.076$ for large language models. The power law does not hold uniformly: local exponents decay with scale, and MobileNetV2 saturates at 19.8M parameters ($\alpha_{\mathrm{local}} =…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science