Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev, Arora

TL;DR
This paper investigates how prompt templates influence the safety alignment of LLMs after fine-tuning and introduces the PTST strategy, which improves safety preservation during model adaptation for specific tasks.
Contribution
It reveals the critical role of prompt templates in maintaining alignment and proposes the PTST method, a novel fine-tuning approach that enhances safety in LLMs.
Findings
Prompt templates significantly impact safety alignment post-fine-tuning.
PTST reduces unsafe behaviors in models across multiple benchmarks.
Fine-tuning without safety prompts at training, but including them at testing, improves safety.
Abstract
Public LLMs such as the Llama 2-Chat underwent alignment training and were considered safe. Recently Qi et al. [2024] reported that even benign fine-tuning on seemingly safe datasets can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. We focus on the setting where a public model is fine-tuned before serving users for specific usage, where the model should improve on the downstream task while maintaining alignment. Through extensive experiments on several chat models (Meta's Llama 2-Chat, Mistral AI's Mistral 7B Instruct v0.2, and OpenAI's GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the ``Pure Tuning, Safe Testing'' (PTST) strategy -- fine-tune models without a safety prompt, but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTaxation and Legal Issues
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Focus · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dropout · Layer Normalization · Cosine Annealing · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Dropout
