PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline
Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena, Dhruv Kumar

TL;DR
PromptScreen introduces a lightweight, multi-stage defense system using semantic linear classification to effectively mitigate jailbreak attacks on large language models, significantly improving accuracy and reducing latency.
Contribution
The paper presents PromptScreen, a novel multi-stage pipeline with a semantic filter based on text normalization, TF-IDF, and Linear SVM, achieving high accuracy and efficiency in jailbreaking attack mitigation.
Findings
Achieves 93.4% accuracy and 96.5% specificity in detection.
Reduces attack throughput and latency by over 10 times compared to previous methods.
Successfully classifies over 30,000 labeled prompts with high robustness.
Abstract
Prompt injection and jailbreaking attacks pose persistent security challenges to large language model (LLM)-based systems. We present PromptScreen, an efficient and systematically evaluated defense architecture that mitigates these threats through a lightweight, multi-stage pipeline. Its core component is a semantic filter based on text normalization, TF-IDF representations, and a Linear SVM classifier. Despite its simplicity, this module achieves 93.4% accuracy and 96.5% specificity on held-out data, substantially reducing attack throughput while incurring negligible computational overhead. Building on this efficient foundation, the full pipeline integrates complementary detection and mitigation mechanisms that operate at successive stages, providing strong robustness with minimal latency. In comparative experiments, our SVM-based configuration improves overall accuracy from 35.1% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
