From Language Modeling to Instruction Following: Understanding the   Behavior Shift in LLMs after Instruction Tuning

Xuansheng Wu; Wenlin Yao; Jianshu Chen; Xiaoman Pan; Xiaoyang Wang,; Ninghao Liu; Dong Yu

arXiv:2310.00492·cs.CL·April 5, 2024·2 cites

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang,, Ninghao Liu, Dong Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how instruction tuning modifies large language models internally, revealing that it enhances instruction recognition, alters attention patterns, and shifts knowledge towards user-oriented tasks, thus improving alignment with user intentions.

Contribution

The study introduces new explanation methods to analyze internal model changes due to instruction tuning, providing insights into how models adapt at the layer and attention head levels.

Findings

01

Instruction tuning improves instruction recognition in LLMs.

02

It causes attention heads to focus more on instruction-related words.

03

Feed-forward networks shift their knowledge towards user-oriented tasks.

Abstract

Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attribution, and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. This approach provides an internal perspective of the model shifts on a human-comprehensible level. Our findings reveal three significant impacts of instruction tuning: 1) It empowers LLMs to recognize the instruction parts of user prompts, and promotes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacksonwuxs/interpret_instruction_tuning_llms
noneOfficial

Videos

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsFocus · ALIGN