On the Effect of Instruction Tuning Loss on Generalization

Anwoy Chatterjee; H S V N S Kowndinya Renduchintala; Sumit Bhatia; Tanmoy Chakraborty

arXiv:2507.07817·cs.CL·July 16, 2025

On the Effect of Instruction Tuning Loss on Generalization

Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty

PDF

Open Access

TL;DR

This paper investigates how different weighting schemes for prompt and response tokens in instruction tuning loss affect model performance, proposing Weighted Instruction Tuning (WIT) to improve robustness and generalization.

Contribution

It introduces WIT, a novel loss weighting method for instruction tuning, demonstrating its effectiveness across multiple models, datasets, and benchmarks.

Findings

01

Optimal prompt and response weightings improve performance.

02

Standard instruction tuning loss often limits robustness.

03

WIT enhances model generalization and robustness.

Abstract

Instruction Tuning has emerged as a pivotal post-training paradigm that enables pre-trained language models to better follow user instructions. Despite its significance, little attention has been given to optimizing the loss function used. A fundamental, yet often overlooked, question is whether the conventional auto-regressive objective - where loss is computed only on response tokens, excluding prompt tokens - is truly optimal for instruction tuning. In this work, we systematically investigate the impact of differentially weighting prompt and response tokens in instruction tuning loss, and propose Weighted Instruction Tuning (WIT) as a better alternative to conventional instruction tuning. Through extensive experiments on five language models of different families and scale, three finetuning datasets of different sizes, and five diverse evaluation benchmarks, we show that the standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Measurement and Metrology Techniques · Structural Health Monitoring Techniques · Color Science and Applications