Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective

Jintian Shao

arXiv:2505.16900·cs.CL·June 10, 2025

Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective

Jintian Shao

PDF

Open Access 1 Repo

TL;DR

This paper proposes Power-Law Decay Loss (PDL), a novel loss function for finetuning large language models that re-weights tokens based on their frequency, emphasizing low-frequency, informative tokens to improve text generation quality.

Contribution

It introduces PDL, a theoretically motivated loss function that adjusts token weights according to a power-law decay to enhance finetuning outcomes for text generation tasks.

Findings

01

PDL improves diversity and informativeness in generated text.

02

Theoretically grounded in information theory and linguistics.

03

Applicable to various text generation tasks like summarization and dialogue.

Abstract

During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaojintian/power_law_decay
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Opinion Dynamics and Social Influence

MethodsFocus