Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing
Alessio Langiu

TL;DR
This paper introduces a privacy-preserving framework for LLMs that reduces operational costs and leakage risks by using local summarisation, prompt optimisation, and context management techniques.
Contribution
It formalises the inseparability of context and privacy management and proposes a holistic on-premise privacy guard with novel prompt routing and compression methods.
Findings
45% reduction in operational costs
100% success in redacting personal secrets
85% preference for optimized responses over raw baselines
Abstract
The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
