Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

Alessio Langiu

arXiv:2603.28972·cs.CR·April 1, 2026

Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

Alessio Langiu

PDF

TL;DR

This paper introduces a privacy-preserving framework for LLMs that reduces operational costs and leakage risks by using local summarisation, prompt optimisation, and context management techniques.

Contribution

It formalises the inseparability of context and privacy management and proposes a holistic on-premise privacy guard with novel prompt routing and compression methods.

Findings

01

45% reduction in operational costs

02

100% success in redacting personal secrets

03

85% preference for optimized responses over raw baselines

Abstract

The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.