Highlight & Summarize: RAG without the jailbreaks

Giovanni Cherubin; Andrew Paverd

arXiv:2508.02872·cs.CL·February 16, 2026

Highlight & Summarize: RAG without the jailbreaks

Giovanni Cherubin, Andrew Paverd

PDF

TL;DR

Highlight & Summarize (H&S) is a novel retrieval-augmented generation approach that enhances security against jailbreaks by never revealing user questions to the LLM, thus preventing malicious prompt injections.

Contribution

This paper introduces H&S, a new RAG design pattern that prevents jailbreaks by separating question highlighting and answer summarization, avoiding direct question exposure to the LLM.

Findings

01

H&S achieves comparable or better answer quality than standard RAG.

02

H&S effectively prevents jailbreak attacks by design.

03

Evaluations show H&S maintains high relevance and correctness.

Abstract

Preventing jailbreaking and model hijacking of Large Language Models (LLMs) is an important yet challenging task. When interacting with a chatbot, malicious users can input specially crafted prompts that cause the LLM to generate undesirable content or perform a different task from its intended purpose. Existing systems attempt to mitigate this by hardening the LLM's system prompt or using additional classifiers to detect undesirable content or off-topic conversations. However, these probabilistic approaches are relatively easy to bypass due to the very large space of possible inputs and undesirable outputs. We present and evaluate Highlight & Summarize (H&S), a new design pattern for retrieval-augmented generation (RAG) systems that prevents these attacks by design. The core idea is to perform the same task as a standard RAG pipeline (i.e., to provide natural language answers to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.