A Survey on the Honesty of Large Language Models

Siheng Li; Cheng Yang; Taiqiang Wu; Chufan Shi; Yuji Zhang; Xinyu Zhu,; Zesen Cheng; Deng Cai; Mo Yu; Lemao Liu; Jie Zhou; Yujiu Yang; Ngai Wong,; Xixin Wu; Wai Lam

arXiv:2409.18786·cs.CL·September 30, 2024

A Survey on the Honesty of Large Language Models

Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu,, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie Zhou, Yujiu Yang, Ngai Wong,, Xixin Wu, Wai Lam

PDF

Open Access 2 Repos

TL;DR

This survey reviews the importance of honesty in large language models, examining current challenges, evaluation methods, and strategies for enhancing truthful behavior to better align with human values.

Contribution

It provides a comprehensive overview of honesty in LLMs, clarifies definitions, evaluates existing approaches, and suggests directions for future research.

Findings

01

Current LLMs still exhibit significant dishonest behaviors.

02

Evaluation of honesty in LLMs is complex due to varying definitions.

03

Strategies for improving honesty are discussed and analyzed.

Abstract

Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge. Despite promising, current LLMs still exhibit significant dishonest behaviors, such as confidently presenting wrong answers or failing to express what they know. In addition, research on the honesty of LLMs also faces challenges, including varying definitions of honesty, difficulties in distinguishing between known and unknown knowledge, and a lack of comprehensive understanding of related research. To address these issues, we provide a survey on the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement. Moreover, we offer insights for future research, aiming to inspire further exploration in this important area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI)