Theoretical Foundations of Scaling Law in Familial Models

Huan Song; Qingfei Zhao; Ting Long; Shuyu Tian; Hongjun An; Jiawei Shao; Xuelong Li

arXiv:2512.23407·cs.LG·January 26, 2026

Theoretical Foundations of Scaling Law in Familial Models

Huan Song, Qingfei Zhao, Ting Long, Shuyu Tian, Hongjun An, Jiawei Shao, Xuelong Li

PDF

Open Access

TL;DR

This paper extends neural scaling laws to familial models with early exits and relay inference, introducing a new scaling variable called Granularity (G) and demonstrating that flexible deployment does not compromise training efficiency.

Contribution

It introduces a unified scaling law incorporating Granularity (G) for familial models and empirically validates the impact of architecture on training efficiency.

Findings

01

Granularity penalty follows a multiplicative power law with a small exponent.

02

The scaling law bridges fixed-compute training with dynamic architectures.

03

Deployment flexibility does not compromise compute-optimality.

Abstract

Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks "Familial models, a transformative paradigm essential for realizing ubiquitous intelligence across heterogeneous device-edge-cloud hierarchies. Transcending static architectures, familial models integrate early exits with relay-style inference to spawn G deployable sub-models from a single shared backbone. In this work, we theoretically and empirically extend the scaling law to capture this "one-run, many-models" paradigm by introducing Granularity (G) as a fundamental scaling variable alongside model size (N) and training tokens (D). To rigorously quantify this relationship, we propose a unified functional form L(N, D, G) and parameterize it using large-scale empirical runs. Specifically, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Big Data and Digital Economy