A Framework for Real-time Safeguarding the Text Generation of Large Language Model

Ximing Dong; Dayi Lin; Shaowei Wang; Ahmed E. Hassan

arXiv:2404.19048·cs.CL·May 22, 2025

A Framework for Real-time Safeguarding the Text Generation of Large Language Model

Ximing Dong, Dayi Lin, Shaowei Wang, Ahmed E. Hassan

PDF

Open Access

TL;DR

This paper introduces LLMSafeGuard, a lightweight, real-time framework that enhances the safety of large language models by rejecting unsafe outputs through an external validator, improving safety and efficiency.

Contribution

The paper presents a novel similarity-based validation method and a context-wise timing strategy for real-time safeguarding of LLMs without retraining control models.

Findings

01

Reduces toxic outputs by at least 38.6% in detoxification tasks.

02

Cuts inference time by at least 24.2% with maintained effectiveness.

03

Outperforms state-of-the-art baselines in safety and efficiency.

Abstract

Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. Existing methods have limitations, including the need for training specific control models and proactive intervention during text generation, that lead to quality degradation and increased computational overhead. To mitigate those limitations, we propose LLMSafeGuard, a lightweight real-time framework that integrates an external validator into decoding, rejecting unsafe outputs while allowing valid ones. We introduce a similarity-based validation approach, simplifying constraint introduction and eliminating the need for control model training. Additionally, LLMSafeGuard employs a context-wise timing selection strategy, intervening LLMs only when necessary. We evaluate LLMSafeGuard on detoxification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling