Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo, Yujin Huang, Chunyang Chen, Zhenchang Xing

TL;DR
This paper conducts a comprehensive red teaming analysis of ChatGPT to identify ethical risks such as bias, toxicity, and robustness issues, highlighting gaps in current benchmarks and proposing considerations for responsible AI development.
Contribution
It systematically examines ChatGPT's ethical vulnerabilities from multiple perspectives and demonstrates the limitations of existing benchmarks in addressing these risks.
Findings
Significant ethical risks remain unaddressed by current benchmarks.
ChatGPT exhibits biases, toxicity, and robustness issues in various scenarios.
The study provides insights for designing more responsible and ethical large language models.
Abstract
Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text in an open-ended way, therefore translating the theoretical algorithms into practical applications. The large language models (LLMs) have significantly impacted businesses such as report summarization software and copywriters. Observations indicate, however, that LLMs may exhibit social prejudice and toxicity, posing ethical and societal dangers of consequences resulting from irresponsibility. Large-scale benchmarks for accountable LLMs should consequently be developed. Although several empirical investigations reveal the existence of a few ethical difficulties in advanced LLMs, there is little systematic examination and user study of the risks and harmful behaviors of current LLM usage. To further educate future efforts on constructing ethical LLMs responsibly, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
