A Flexible Large Language Models Guardrail Development Methodology   Applied to Off-Topic Prompt Detection

Gabriel Chua; Shing Yee Chan; Shaun Khoo

arXiv:2411.12946·cs.CL·April 10, 2025

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Gabriel Chua, Shing Yee Chan, Shaun Khoo

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper presents a flexible, data-free methodology for developing off-topic prompt guardrails for large language models, utilizing synthetic datasets generated by LLMs to improve safety and reduce false positives.

Contribution

It introduces a novel, adaptable guardrail development approach that does not require real-world data and generalizes across misuse categories, with open-sourced resources for the community.

Findings

01

Outperforms heuristic guardrails in off-topic detection

02

Generalizes to jailbreak and harmful prompt detection

03

Provides open-source datasets and models

Abstract

Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often rely on curated examples or custom classifiers, suffer from high false-positive rates, limited adaptability, and the impracticality of requiring real-world data that is not available in pre-production. In this paper, we introduce a flexible, data-free guardrail development methodology that addresses these challenges. By thoroughly defining the problem space qualitatively and passing this to an LLM to generate diverse prompts, we construct a synthetic dataset to benchmark and train off-topic guardrails that outperform heuristic approaches. Additionally, by framing the task as classifying whether the user prompt is relevant with respect to the system prompt, our guardrails effectively generalize to other misuse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/gabrielchua/off-topic
noneOfficial

Models

Datasets

gabrielchua/off-topic
dataset· 66 dl
66 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Web Data Mining and Analysis · Software Engineering Research