Why Does ChatGPT Fall Short in Providing Truthful Answers?

Shen Zheng; Jie Huang; Kevin Chen-Chuan Chang

arXiv:2304.10513·cs.CL·December 5, 2023·37 cites

Why Does ChatGPT Fall Short in Providing Truthful Answers?

Shen Zheng, Jie Huang, Kevin Chen-Chuan Chang

PDF

Open Access

TL;DR

This paper investigates why ChatGPT struggles with truthful answers, focusing on factuality issues related to knowledge memorization and recall, and proposes strategies to improve its factual accuracy.

Contribution

It provides a detailed analysis of ChatGPT's failures in factuality and suggests enhancement strategies involving external knowledge and recall cues.

Findings

01

Factuality is the primary failure mode in ChatGPT's answers.

02

Augmenting with external knowledge improves factual accuracy.

03

Using cues for knowledge recall enhances model performance.

Abstract

Recent advancements in large language models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in providing reliable and accurate answers to user questions. To better understand the model's particular weaknesses in providing truthful answers, we embark an in-depth exploration of open-domain question answering. Specifically, we undertake a detailed examination of ChatGPT's failures, categorized into: comprehension, factuality, specificity, and inference. We further pinpoint factuality as the most contributing failure and identify two critical abilities associated with factuality: knowledge memorization and knowledge recall. Through experiments focusing on factuality, we propose several potential enhancement strategies. Our findings suggest that augmenting the model with granular external knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Expert finding and Q&A systems