Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective
Jintian Shao

TL;DR
This paper proposes Power-Law Decay Loss (PDL), a novel loss function for finetuning large language models that re-weights tokens based on their frequency, emphasizing low-frequency, informative tokens to improve text generation quality.
Contribution
It introduces PDL, a theoretically motivated loss function that adjusts token weights according to a power-law decay to enhance finetuning outcomes for text generation tasks.
Findings
PDL improves diversity and informativeness in generated text.
Theoretically grounded in information theory and linguistics.
Applicable to various text generation tasks like summarization and dialogue.
Abstract
During the finetuning stage of text generation tasks, standard cross-entropy loss treats all tokens equally. This can lead models to overemphasize high-frequency, low-information tokens, neglecting lower-frequency tokens crucial for specificity and informativeness in generated content. This paper introduces a novel loss function, Power-Law Decay Loss (PDL), specifically designed to optimize the finetuning process for text generation. The core motivation for PDL stems from observations in information theory and linguistics: the informativeness of a token is often inversely proportional to its frequency of occurrence. PDL re-weights the contribution of each token in the standard cross-entropy loss based on its frequency in the training corpus, following a power-law decay. Specifically, the weights for high-frequency tokens are reduced, while low-frequency, information-dense tokens are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Opinion Dynamics and Social Influence
MethodsFocus
