From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang,, Ninghao Liu, Dong Yu

TL;DR
This paper investigates how instruction tuning modifies large language models internally, revealing that it enhances instruction recognition, alters attention patterns, and shifts knowledge towards user-oriented tasks, thus improving alignment with user intentions.
Contribution
The study introduces new explanation methods to analyze internal model changes due to instruction tuning, providing insights into how models adapt at the layer and attention head levels.
Findings
Instruction tuning improves instruction recognition in LLMs.
It causes attention heads to focus more on instruction-related words.
Feed-forward networks shift their knowledge towards user-oriented tasks.
Abstract
Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attribution, and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. This approach provides an internal perspective of the model shifts on a human-comprehensible level. Our findings reveal three significant impacts of instruction tuning: 1) It empowers LLMs to recognize the instruction parts of user prompts, and promotes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsFocus · ALIGN
