Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and   Toxicity

Terry Yue Zhuo; Yujin Huang; Chunyang Chen; Zhenchang Xing

arXiv:2301.12867·cs.CL·May 30, 2023·170 cites

Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

Terry Yue Zhuo, Yujin Huang, Chunyang Chen, Zhenchang Xing

PDF

Open Access

TL;DR

This paper conducts a comprehensive red teaming analysis of ChatGPT to identify ethical risks such as bias, toxicity, and robustness issues, highlighting gaps in current benchmarks and proposing considerations for responsible AI development.

Contribution

It systematically examines ChatGPT's ethical vulnerabilities from multiple perspectives and demonstrates the limitations of existing benchmarks in addressing these risks.

Findings

01

Significant ethical risks remain unaddressed by current benchmarks.

02

ChatGPT exhibits biases, toxicity, and robustness issues in various scenarios.

03

The study provides insights for designing more responsible and ethical large language models.

Abstract

Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text in an open-ended way, therefore translating the theoretical algorithms into practical applications. The large language models (LLMs) have significantly impacted businesses such as report summarization software and copywriters. Observations indicate, however, that LLMs may exhibit social prejudice and toxicity, posing ethical and societal dangers of consequences resulting from irresponsibility. Large-scale benchmarks for accountable LLMs should consequently be developed. Although several empirical investigations reveal the existence of a few ethical difficulties in advanced LLMs, there is little systematic examination and user study of the risks and harmful behaviors of current LLM usage. To further educate future efforts on constructing ethical LLMs responsibly, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI