SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

Pratyush Desai; Luoxi Tang; Yuqiao Meng; Zhaohan Xi

arXiv:2601.06366·cs.CR·May 18, 2026

SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi

PDF

TL;DR

SafeGPT is a comprehensive system designed to prevent data leakage and unethical outputs in enterprise LLM applications through input detection, output moderation, and human feedback.

Contribution

It introduces a novel two-sided guardrail system combining detection, moderation, and human-in-the-loop feedback for enhanced LLM safety in enterprise settings.

Findings

01

Reduces data leakage risk effectively

02

Decreases biased and unethical outputs

03

Maintains user satisfaction

Abstract

Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques