Kinetics: Rethinking Test-Time Scaling Laws

Ranajoy Sadhukhan; Zhuoming Chen; Haizhong Zheng; Yang Zhou; Emma Strubell; Beidi Chen

arXiv:2506.05333·cs.LG·June 23, 2025

Kinetics: Rethinking Test-Time Scaling Laws

Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces the Kinetics Scaling Law, emphasizing the importance of memory access costs and sparse attention in test-time scaling, leading to more efficient and accurate model inference especially for larger models.

Contribution

It proposes a new scaling law that incorporates memory bottlenecks and advocates for sparse attention, significantly improving inference efficiency and accuracy.

Findings

01

Sparse attention models outperform dense models in accuracy.

02

Test-time compute is more effective on larger models above a certain threshold.

03

Sparse attention enables longer generations and more parallel samples within the same resource budget.

Abstract

We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottlenecks introduced by inference-time strategies (e.g., Best-of- $N$ , long CoTs). Our holistic analysis, spanning models from 0.6B to 32B parameters, reveals a new Kinetics Scaling Law that better guides resource allocation by incorporating both computation and memory access costs. Kinetics Scaling Law suggests that test-time compute is more effective when used on models above a threshold than smaller ones. A key reason is that in TTS, attention, rather than parameter count, emerges as the dominant cost factor. Motivated by this, we propose a new scaling paradigm centered on sparse attention, which lowers per-token cost and enables longer generations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

infini-ai-lab/kinetics
pytorchOfficial

Datasets

InfiniAILab/Kinetics-generations
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Machine Learning in Materials Science · Cloud Computing and Resource Management

MethodsSoftmax · Attention Is All You Need