Theoretical Foundations of Scaling Law in Familial Models
Huan Song, Qingfei Zhao, Ting Long, Shuyu Tian, Hongjun An, Jiawei Shao, Xuelong Li

TL;DR
This paper extends neural scaling laws to familial models with early exits and relay inference, introducing a new scaling variable called Granularity (G) and demonstrating that flexible deployment does not compromise training efficiency.
Contribution
It introduces a unified scaling law incorporating Granularity (G) for familial models and empirically validates the impact of architecture on training efficiency.
Findings
Granularity penalty follows a multiplicative power law with a small exponent.
The scaling law bridges fixed-compute training with dynamic architectures.
Deployment flexibility does not compromise compute-optimality.
Abstract
Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks "Familial models, a transformative paradigm essential for realizing ubiquitous intelligence across heterogeneous device-edge-cloud hierarchies. Transcending static architectures, familial models integrate early exits with relay-style inference to spawn G deployable sub-models from a single shared backbone. In this work, we theoretically and empirically extend the scaling law to capture this "one-run, many-models" paradigm by introducing Granularity (G) as a fundamental scaling variable alongside model size (N) and training tokens (D). To rigorously quantify this relationship, we propose a unified functional form L(N, D, G) and parameterize it using large-scale empirical runs. Specifically, we employ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Big Data and Digital Economy
