On the Loss of Context-awareness in General Instruction Fine-tuning

Yihan Wang; Andrew Bai; Nanyun Peng; Cho-Jui Hsieh

arXiv:2411.02688·cs.CL·February 4, 2025·2 cites

On the Loss of Context-awareness in General Instruction Fine-tuning

Yihan Wang, Andrew Bai, Nanyun Peng, Cho-Jui Hsieh

PDF

Open Access 1 Repo

TL;DR

This paper investigates how supervised fine-tuning of large language models can diminish their ability to understand and utilize context, and proposes a method to preserve this context-awareness during fine-tuning.

Contribution

The authors identify the cause of context-awareness loss in instruction fine-tuned LLMs and introduce a conditional fine-tuning approach to maintain context understanding.

Findings

01

The loss of context awareness is linked to role bias learned during fine-tuning.

02

Applying a context-dependency indicator improves context retention.

03

The method preserves instruction-following capabilities across multiple tasks.

Abstract

Pre-trained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pre-training. In this paper, we investigate the loss of context awareness after SFT, where context awareness is defined as the ability to extract and understand information from user-provided context and respond accordingly. We identify and demonstrate that the loss of context awareness, particularly in open-source models, occurs in instruction fine-tuned LLMs when the chat template is applied to input prompts. We identify that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We demonstrate this correlation by visualizing changes in attention allocation after the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YihanWang617/context_awareness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Intelligent Tutoring Systems and Adaptive Learning · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · ALIGN · Shrink and Fine-Tune · Focus