Towards Safer Chatbots: Automated Policy Compliance Evaluation of Custom GPTs
David Rodriguez, William Seymour, Jose M. Del Alamo, Jose Such

TL;DR
This paper introduces an automated, scalable method for evaluating the policy compliance of custom GPT chatbots, revealing significant policy violations and highlighting limitations in current review processes.
Contribution
It presents a novel black-box, automated approach combining GPT discovery, red-teaming prompts, and LLM-based judgment to assess compliance of custom GPTs with usage policies.
Findings
58.7% of evaluated GPTs violate policies
High accuracy (F1=0.975) in violation detection
Violations mainly stem from model-level behavior
Abstract
User-configured chatbots built on top of large language models are increasingly available through centralized marketplaces such as OpenAI's GPT Store. While these platforms enforce usage policies intended to prevent harmful or inappropriate behavior, the scale and opacity of customized chatbots make systematic policy enforcement challenging. As a result, policy-violating chatbots continue to remain publicly accessible despite existing review processes. This paper presents a fully automated method for evaluating the compliance of Custom GPTs with its marketplace usage policy using black-box interaction. The method combines large-scale GPT discovery, policy-driven red-teaming prompts, and automated compliance assessment using an LLM-as-a-judge. We focus on three policy-relevant domains explicitly addressed in OpenAI's usage policies: Romantic, Cybersecurity, and Academic GPTs. We validate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions
