Measuring and Controlling Instruction (In)Stability in Language Model   Dialogs

Kenneth Li; Tianle Liu; Naomi Bashkansky; David Bau; Fernanda; Vi\'egas; Hanspeter Pfister; Martin Wattenberg

arXiv:2402.10962·cs.CL·July 29, 2024·3 cites

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda, Vi\'egas, Hanspeter Pfister, Martin Wattenberg

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces a benchmark to evaluate instruction stability in language model dialogs, revealing significant drift over conversations and proposing a method to mitigate it, with implications for more reliable chatbot customization.

Contribution

It provides the first quantitative benchmark for instruction stability in language models and proposes a novel split-softmax method to reduce instruction drift.

Findings

01

Significant instruction drift occurs within eight conversation rounds.

02

Attention decay in transformers contributes to instruction drift.

03

Split-softmax outperforms baseline methods in maintaining instruction fidelity.

Abstract

System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

likenneth/persona_drift
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention · Weight Decay · Adam · Cosine Annealing · Byte Pair Encoding