In ChatGPT We Trust? Measuring and Characterizing the Reliability of   ChatGPT

Xinyue Shen; Zeyuan Chen; Michael Backes; Yang Zhang

arXiv:2304.08979·cs.CR·October 6, 2023·71 cites

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang

PDF

Open Access

TL;DR

This study systematically evaluates ChatGPT's reliability across multiple domains and question types, revealing variability in performance, vulnerabilities to adversarial inputs, and the influence of system roles on answer accuracy.

Contribution

It provides the first large-scale measurement of ChatGPT's reliability, highlighting domain-specific weaknesses and the impact of system roles and adversarial attacks.

Findings

01

ChatGPT underperforms in law and science questions.

02

System roles can subtly influence reliability.

03

Adversarial examples can significantly reduce accuracy.

Abstract

The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million users within a short period of time but has also raised concerns regarding its reliability. In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5,695 questions across ten datasets and eight domains. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions. We also demonstrate that system roles, originally designed by OpenAI to allow users to steer ChatGPT's behavior, can impact ChatGPT's reliability in an imperceptible way. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)