Towards Safer Chatbots: Automated Policy Compliance Evaluation of Custom GPTs

David Rodriguez; William Seymour; Jose M. Del Alamo; Jose Such

arXiv:2502.01436·cs.CL·December 22, 2025

Towards Safer Chatbots: Automated Policy Compliance Evaluation of Custom GPTs

David Rodriguez, William Seymour, Jose M. Del Alamo, Jose Such

PDF

Open Access

TL;DR

This paper introduces an automated, scalable method for evaluating the policy compliance of custom GPT chatbots, revealing significant policy violations and highlighting limitations in current review processes.

Contribution

It presents a novel black-box, automated approach combining GPT discovery, red-teaming prompts, and LLM-based judgment to assess compliance of custom GPTs with usage policies.

Findings

01

58.7% of evaluated GPTs violate policies

02

High accuracy (F1=0.975) in violation detection

03

Violations mainly stem from model-level behavior

Abstract

User-configured chatbots built on top of large language models are increasingly available through centralized marketplaces such as OpenAI's GPT Store. While these platforms enforce usage policies intended to prevent harmful or inappropriate behavior, the scale and opacity of customized chatbots make systematic policy enforcement challenging. As a result, policy-violating chatbots continue to remain publicly accessible despite existing review processes. This paper presents a fully automated method for evaluating the compliance of Custom GPTs with its marketplace usage policy using black-box interaction. The method combines large-scale GPT discovery, policy-driven red-teaming prompts, and automated compliance assessment using an LLM-as-a-judge. We focus on three policy-relevant domains explicitly addressed in OpenAI's usage policies: Romantic, Cybersecurity, and Academic GPTs. We validate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions