Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen

TL;DR
This paper introduces the Kinetics Scaling Law, emphasizing the importance of memory access costs and sparse attention in test-time scaling, leading to more efficient and accurate model inference especially for larger models.
Contribution
It proposes a new scaling law that incorporates memory bottlenecks and advocates for sparse attention, significantly improving inference efficiency and accuracy.
Findings
Sparse attention models outperform dense models in accuracy.
Test-time compute is more effective on larger models above a certain threshold.
Sparse attention enables longer generations and more parallel samples within the same resource budget.
Abstract
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of-, long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Machine Learning in Materials Science · Cloud Computing and Resource Management
MethodsSoftmax · Attention Is All You Need
