Unifying Two Types of Scaling Laws from the Perspective of Conditional Kolmogorov Complexity
Jun Wan

TL;DR
This paper unifies two types of Scaling Laws for large language models by analyzing their training and inference through the lens of conditional Kolmogorov complexity, revealing that both laws relate to increasing execution steps in a Turing machine.
Contribution
It introduces a unified framework for understanding different Scaling Laws using conditional Kolmogorov complexity, connecting model size and inference steps.
Findings
Both Scaling Laws improve approximation of Kolmogorov complexity with more execution steps.
Model size increases execution steps for training-based Scaling Laws.
Number of intermediate tokens increases execution steps for inference-based Scaling Laws.
Abstract
In 2020, OpenAI proposed the first type of Scaling Laws, describing the relationships between model loss and the scale of parameters, data, and training computation. In 2024, OpenAI proposed the second type of Scaling Laws, describing the relationship between model inference performance and inference computation. In this paper, we analyze LLMs training and inference processes from the perspective of lossless compression using conditional Kolmogorov complexity, and unify these two types of Scaling Laws. We find that both types of Scaling Laws improve approximation of conditional Kolmogorov complexity by increasing execution steps of Turing machine. The first type of Scaling Laws increases execution steps by increasing number of model parameters. The second type of Scaling Laws increases execution steps by increasing the number of intermediate tokens.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
