Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda, Vi\'egas, Hanspeter Pfister, Martin Wattenberg

TL;DR
This paper introduces a benchmark to evaluate instruction stability in language model dialogs, revealing significant drift over conversations and proposing a method to mitigate it, with implications for more reliable chatbot customization.
Contribution
It provides the first quantitative benchmark for instruction stability in language models and proposes a novel split-softmax method to reduce instruction drift.
Findings
Significant instruction drift occurs within eight conversation rounds.
Attention decay in transformers contributes to instruction drift.
Split-softmax outperforms baseline methods in maintaining instruction fidelity.
Abstract
System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention · Weight Decay · Adam · Cosine Annealing · Byte Pair Encoding
