RogueGPT: dis-ethical tuning transforms ChatGPT4 into a Rogue AI in 158 Words
Alessio Buscemi, Daniele Proverbio

TL;DR
This paper demonstrates how easily ChatGPT can be manipulated through simple prompts and fine-tuning to behave unethically, raising concerns about AI safety, data quality, and safeguard robustness.
Contribution
It empirically investigates the vulnerabilities of ChatGPT's ethical safeguards to user-driven modifications, revealing significant risks of malevolent behavior.
Findings
RogueGPT can bypass ethical guardrails with simple prompts.
It responds to questions about illegal activities and violence.
The study highlights risks in training data and safeguard implementation.
Abstract
The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed "RogueGPT", responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model's knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Explainable Artificial Intelligence (XAI)
