Adding guardrails to advanced chatbots

Yanchen Wang; Lisa Singh

arXiv:2306.07500·cs.CY·June 14, 2023·6 cites

Adding guardrails to advanced chatbots

Yanchen Wang, Lisa Singh

PDF

Open Access

TL;DR

This paper evaluates ChatGPT's fairness and biases across various tasks, highlighting the need for mitigation strategies and impartial review panels to enhance safety and equity in advanced chatbots.

Contribution

It provides an analysis of ChatGPT's strengths and biases, proposing strategies and the establishment of review panels to improve fairness and safety.

Findings

01

ChatGPT is a fair search engine for tested tasks

02

Biases are present in text and code generation

03

Small prompt changes affect fairness levels

Abstract

Generative AI models continue to become more powerful. The launch of ChatGPT in November 2022 has ushered in a new era of AI. ChatGPT and other similar chatbots have a range of capabilities, from answering student homework questions to creating music and art. There are already concerns that humans may be replaced by chatbots for a variety of jobs. Because of the wide spectrum of data chatbots are built on, we know that they will have human errors and human biases built into them. These biases may cause significant harm and/or inequity toward different subpopulations. To understand the strengths and weakness of chatbot responses, we present a position paper that explores different use cases of ChatGPT to determine the types of questions that are answered fairly and the types that still need improvement. We find that ChatGPT is a fair search engine for the tasks we tested; however, it has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · AI in Service Interactions