PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization

Huseein Jawad; Nicolas Brunel

arXiv:2511.16209·cs.CR·February 3, 2026

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization

Huseein Jawad, Nicolas Brunel

PDF

Open Access

TL;DR

This paper proposes a lightweight, black-box optimization framework to harden system prompts in LLMs by adding protective layers, significantly reducing prompt leakage from adversarial attacks while maintaining task utility.

Contribution

It introduces a formal utility-constrained optimization approach using an LLM as an optimizer to generate shield prompts that minimize leakage without sacrificing performance.

Findings

01

Optimized SHIELDs significantly reduce prompt leakage against extraction attacks.

02

The method outperforms existing defenses in effectiveness.

03

It maintains high task utility with minimal overhead.

Abstract

System prompts are critical for guiding the behavior of Large Language Models (LLMs), yet they often contain proprietary logic or sensitive information, making them a prime target for extraction attacks. Adversarial queries can successfully elicit these hidden instructions, posing significant security and privacy risks. Existing defense mechanisms frequently rely on heuristics, incur substantial computational overhead, or are inapplicable to models accessed via black-box APIs. This paper introduces a novel framework for hardening system prompts through shield appending, a lightweight approach that adds a protective textual layer to the original prompt. Our core contribution is the formalization of prompt hardening as a utility-constrained optimization problem. We leverage an LLM-as-optimizer to search the space of possible SHIELDs, seeking to minimize a leakage metric derived from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Topic Modeling