A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

Ryan Lagasse; Aidan Kierans; Avijit Ghosh; Shiri Dori-Hacohen

arXiv:2505.06150·cs.CL·June 4, 2025

A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

Ryan Lagasse, Aidan Kierans, Avijit Ghosh, Shiri Dori-Hacohen

PDF

Open Access

TL;DR

This paper proposes a new scaling law for fine-tuning large language models that considers data composition, such as dataset volume, to improve token efficiency under fixed compute budgets.

Contribution

It introduces a scaling law that explicitly accounts for data composition factors, enhancing understanding of token efficiency in resource-constrained LLM fine-tuning.

Findings

01

Data composition significantly impacts token efficiency.

02

Refined scaling laws improve fine-tuning strategies.

03

Experiments validate the importance of dataset volume.

Abstract

We introduce a scaling law for fine-tuning large language models (LLMs) under fixed compute budgets that explicitly accounts for data composition. Conventional approaches measure training data solely by total tokens, yet the number of examples and their average token length -- what we term \emph{dataset volume} -- play a decisive role in model performance. Our formulation is tuned following established procedures. Experiments on the BRICC dataset \cite{salavati2024reducing} and subsets of the MMLU dataset \cite{hendrycks2021measuringmassivemultitasklanguage}, evaluated under multiple subsampling strategies, reveal that data composition significantly affects token efficiency. These results motivate refined scaling laws for practical LLM fine-tuning in resource-constrained settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management