On the Effect of Instruction Tuning Loss on Generalization
Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty

TL;DR
This paper investigates how different weighting schemes for prompt and response tokens in instruction tuning loss affect model performance, proposing Weighted Instruction Tuning (WIT) to improve robustness and generalization.
Contribution
It introduces WIT, a novel loss weighting method for instruction tuning, demonstrating its effectiveness across multiple models, datasets, and benchmarks.
Findings
Optimal prompt and response weightings improve performance.
Standard instruction tuning loss often limits robustness.
WIT enhances model generalization and robustness.
Abstract
Instruction Tuning has emerged as a pivotal post-training paradigm that enables pre-trained language models to better follow user instructions. Despite its significance, little attention has been given to optimizing the loss function used. A fundamental, yet often overlooked, question is whether the conventional auto-regressive objective - where loss is computed only on response tokens, excluding prompt tokens - is truly optimal for instruction tuning. In this work, we systematically investigate the impact of differentially weighting prompt and response tokens in instruction tuning loss, and propose Weighted Instruction Tuning (WIT) as a better alternative to conventional instruction tuning. Through extensive experiments on five language models of different families and scale, three finetuning datasets of different sizes, and five diverse evaluation benchmarks, we show that the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Measurement and Metrology Techniques · Structural Health Monitoring Techniques · Color Science and Applications
