PARASITE: Conditional System Prompt Poisoning to Hijack LLMs

Viet Pham; Thai Le

arXiv:2505.16888·cs.CR·April 28, 2026

PARASITE: Conditional System Prompt Poisoning to Hijack LLMs

Viet Pham, Thai Le

PDF

1 Repo

TL;DR

This paper introduces PARASITE, a method for injecting targeted, conditional prompts into LLMs to manipulate specific outputs without affecting general performance, highlighting a new supply-chain vulnerability.

Contribution

PARASITE is a novel framework that optimizes system prompts to selectively hijack LLM responses in a black-box setting, bypassing standard defenses.

Findings

01

Achieves up to 70% F1 reduction on targeted queries

02

Maintains high utility on benign inputs

03

Evades standard prompt defenses

Abstract

Large Language Models (LLMs) are increasingly deployed via third-party system prompts downloaded from public marketplaces. We identify a critical supply-chain vulnerability: conditional system prompt poisoning, where an adversary injects a ``sleeper agent'' into a benign-looking prompt. Unlike traditional jailbreaks that aim for broad refusal-breaking, our proposed framework, PARASITE, optimizes system prompts to trigger LLMs to output targeted, compromised responses only for specific queries (e.g., ``Who should I vote for the US President?'') while maintaining high utility on benign inputs. Operating in a strict black-box setting without model weight access, PARASITE utilizes a two-stage optimization including a global semantic search followed by a greedy lexical refinement. Tested on open-source models and commercial APIs (GPT-4o-mini, GPT-3.5), PARASITE achieves up to 70\% F1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vietph34/PARASITE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.