On the Loss of Context-awareness in General Instruction Fine-tuning
Yihan Wang, Andrew Bai, Nanyun Peng, Cho-Jui Hsieh

TL;DR
This paper investigates how supervised fine-tuning of large language models can diminish their ability to understand and utilize context, and proposes a method to preserve this context-awareness during fine-tuning.
Contribution
The authors identify the cause of context-awareness loss in instruction fine-tuned LLMs and introduce a conditional fine-tuning approach to maintain context understanding.
Findings
The loss of context awareness is linked to role bias learned during fine-tuning.
Applying a context-dependency indicator improves context retention.
The method preserves instruction-following capabilities across multiple tasks.
Abstract
Pre-trained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pre-training. In this paper, we investigate the loss of context awareness after SFT, where context awareness is defined as the ability to extract and understand information from user-provided context and respond accordingly. We identify and demonstrate that the loss of context awareness, particularly in open-source models, occurs in instruction fine-tuned LLMs when the chat template is applied to input prompts. We identify that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We demonstrate this correlation by visualizing changes in attention allocation after the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Intelligent Tutoring Systems and Adaptive Learning · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · ALIGN · Shrink and Fine-Tune · Focus
