SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use
Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi

TL;DR
SafeGPT is a comprehensive system designed to prevent data leakage and unethical outputs in enterprise LLM applications through input detection, output moderation, and human feedback.
Contribution
It introduces a novel two-sided guardrail system combining detection, moderation, and human-in-the-loop feedback for enhanced LLM safety in enterprise settings.
Findings
Reduces data leakage risk effectively
Decreases biased and unethical outputs
Maintains user satisfaction
Abstract
Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques
