"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions   of Large Language Models with Suggest-Critique-Reflect Process

Pingchuan Ma; Zongjie Li; Ao Sun; Shuai Wang

arXiv:2305.02626·cs.SE·May 5, 2023·6 cites

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

Pingchuan Ma, Zongjie Li, Ao Sun, Shuai Wang

PDF

Open Access 1 Repo

TL;DR

This paper presents ETHICSSUITE, a comprehensive testing framework, and a suggest-critic-reflect process for automatically detecting and repairing unethical suggestions made by large language models in real-time.

Contribution

It introduces the first automated testing and repair framework for unethical LLM suggestions, including a novel OTF repair scheme applicable to black-box APIs.

Findings

01

Uncovered 109,824 unethical suggestions across seven LLMs.

02

OTF repair scheme effectively repairs a significant portion of unethical outputs.

03

Demonstrated real-time repair capability with Llama-13B and ChatGPT.

Abstract

As the popularity of large language models (LLMs) soars across various applications, ensuring their alignment with human values has become a paramount concern. In particular, given that LLMs have great potential to serve as general-purpose AI assistants in daily life, their subtly unethical suggestions become a serious and real concern. Tackling the challenge of automatically testing and repairing unethical suggestions is thus demanding. This paper introduces the first framework for testing and repairing unethical suggestions made by LLMs. We first propose ETHICSSUITE, a test suite that presents complex, contextualized, and realistic moral scenarios to test LLMs. We then propose a novel suggest-critic-reflect (SCR) process, serving as an automated test oracle to detect unethical suggestions. We recast deciding if LLMs yield unethical suggestions (a hard problem; often requiring human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

llm-ethics/ethicssuite
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsRepair · Test