A Closer Look at System Prompt Robustness

Norman Mu; Jonathan Lu; Michael Lavery; David Wagner

arXiv:2502.12197·cs.CL·February 19, 2025

A Closer Look at System Prompt Robustness

Norman Mu, Jonathan Lu, Michael Lavery, David Wagner

PDF

Open Access

TL;DR

This paper investigates the robustness of system prompts in large language models, proposing new datasets and methods to improve adherence to prompts, and evaluates the effectiveness of fine-tuning and inference techniques.

Contribution

It introduces realistic evaluation datasets and assesses various fine-tuning and inference methods to enhance system prompt robustness in LLMs.

Findings

01

Fine-tuning with realistic data improves robustness.

02

Inference-time interventions like classifier-free guidance help.

03

Current techniques still fall short of full robustness.

Abstract

System prompts have emerged as a critical control surface for specifying the behavior of LLMs in chat and agent settings. Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures, all of which require models to robustly adhere to the system prompt, especially when facing conflicting or adversarial user inputs. In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user. In this work, we study various methods for improving system prompt robustness by creating realistic new evaluation and fine-tuning datasets based on prompts collected from from OpenAI's GPT Store and HuggingFace's HuggingChat. Our experiments assessing models with a panel of new and existing benchmarks show that performance can be considerably improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems