Adaptive Content Restriction for Large Language Models via Suffix Optimization

Yige Li; Peihai Jiang; Jun Sun; Peng Shu; Tianming Liu; and Zhen Xiang

arXiv:2508.01198·cs.CL·August 5, 2025

Adaptive Content Restriction for Large Language Models via Suffix Optimization

Yige Li, Peihai Jiang, Jun Sun, Peng Shu, Tianming Liu, and Zhen Xiang

PDF

Open Access

TL;DR

This paper introduces AdaCoRe, a lightweight, suffix-based method to restrict harmful content in large language models without fine-tuning, using a new benchmark to evaluate effectiveness across multiple models.

Contribution

It proposes the first suffix optimization technique for adaptive content restriction, enabling targeted term blocking without retraining models, and introduces a comprehensive benchmark for evaluation.

Findings

01

SOP outperforms baselines in restriction effectiveness across multiple models

02

SOP maintains output quality while restricting targeted terms

03

Effective in real-world online platform scenarios

Abstract

Large Language Models (LLMs) have demonstrated significant success across diverse applications. However, enforcing content restrictions remains a significant challenge due to their expansive output space. One aspect of content restriction is preventing LLMs from generating harmful content via model alignment approaches such as supervised fine-tuning (SFT). Yet, the need for content restriction may vary significantly across user groups, change rapidly over time, and not always align with general definitions of harmfulness. Applying SFT to each of these specific use cases is impractical due to the high computational, data, and storage demands. Motivated by this need, we propose a new task called \textit{Adaptive Content Restriction} (AdaCoRe), which focuses on lightweight strategies -- methods without model fine-tuning -- to prevent deployed LLMs from generating restricted terms for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education