Adaptive Content Restriction for Large Language Models via Suffix Optimization
Yige Li, Peihai Jiang, Jun Sun, Peng Shu, Tianming Liu, and Zhen Xiang

TL;DR
This paper introduces AdaCoRe, a lightweight, suffix-based method to restrict harmful content in large language models without fine-tuning, using a new benchmark to evaluate effectiveness across multiple models.
Contribution
It proposes the first suffix optimization technique for adaptive content restriction, enabling targeted term blocking without retraining models, and introduces a comprehensive benchmark for evaluation.
Findings
SOP outperforms baselines in restriction effectiveness across multiple models
SOP maintains output quality while restricting targeted terms
Effective in real-world online platform scenarios
Abstract
Large Language Models (LLMs) have demonstrated significant success across diverse applications. However, enforcing content restrictions remains a significant challenge due to their expansive output space. One aspect of content restriction is preventing LLMs from generating harmful content via model alignment approaches such as supervised fine-tuning (SFT). Yet, the need for content restriction may vary significantly across user groups, change rapidly over time, and not always align with general definitions of harmfulness. Applying SFT to each of these specific use cases is impractical due to the high computational, data, and storage demands. Motivated by this need, we propose a new task called \textit{Adaptive Content Restriction} (AdaCoRe), which focuses on lightweight strategies -- methods without model fine-tuning -- to prevent deployed LLMs from generating restricted terms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education
