RogueGPT: dis-ethical tuning transforms ChatGPT4 into a Rogue AI in 158   Words

Alessio Buscemi; Daniele Proverbio

arXiv:2407.15009·cs.CY·July 24, 2024

RogueGPT: dis-ethical tuning transforms ChatGPT4 into a Rogue AI in 158 Words

Alessio Buscemi, Daniele Proverbio

PDF

Open Access

TL;DR

This paper demonstrates how easily ChatGPT can be manipulated through simple prompts and fine-tuning to behave unethically, raising concerns about AI safety, data quality, and safeguard robustness.

Contribution

It empirically investigates the vulnerabilities of ChatGPT's ethical safeguards to user-driven modifications, revealing significant risks of malevolent behavior.

Findings

01

RogueGPT can bypass ethical guardrails with simple prompts.

02

It responds to questions about illegal activities and violence.

03

The study highlights risks in training data and safeguard implementation.

Abstract

The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed "RogueGPT", responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model's knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Explainable Artificial Intelligence (XAI)